Choose a category, or scroll down for all posts.
Data Analytics | Excel | Git | Python | SQL | StatisticsFuture Data 2020 Many winter moons ago, I (virtually) attended Future Data 2020, a conference about the next generation of data systems. During the conference, I watched an interesting talk given by Tristan Handy, founder and CEO of Fishtown Analytics, called The Modern Data Stack: Past, Present, and Future. During the talk, Tristan discussed a so-called […]
Show/Hide Code Introduction It seems as if people are split on pie charts: either you passionately hate them, or you are indifferent. In this article, we are going to explain why pie charts are problematic and, if you fall into the latter category, what you can do when creating pie charts to avoid upsetting those […]
The goal of this blog post is a compilation of little tidbits and code snippets that address common issues when programming for data analysis in Python. General Snippets Difference between JSON and XML This page gives a great example of the difference between data in JSON format and XML format. It shows the exact same […]
A/B testing (sometimes called split testing) is comparing two versions of a web page, email newsletter, or some other digital content to see which one performs better. A company will compare two web pages by showing the two variants (let’s call them A and B) to similar visitors at the same time. Typically, the company […]
The Situation: Kakes+, a Pennsylvania company that makes terribly unhealthy small pies/cakes, believes that their machines are overfilling their blueberry pies. Kakes+ wants to test this statistically, and has recruited you to come up with a data-backed answer. The pies should weigh 8 ounces each. Step 1: Collect Data You need to weigh the pies […]
This post is updated as appropriate, so keep checking back! Table of Contents Errors when installing python with homebrew Error: Permission denied Error: Could not symlik bin Errors when writing/running python code TypeError NameError IndentationError SyntaxError IndexError KeyError ValueError Installing Python Error: Permission denied @ dir_s_mkdir – /usr/local/Frameworks Check out this article for help: https://github.com/Homebrew/homebrew-core/issues/19286 […]
Tables are one of the most important features of Excel, but are often overlooked. Tables and keeping analyses in Excel connected, will drastically increase your efficiency in Excel. Let’s start by understanding how they work with PivotTables. We’re going to use an R Dataset called DoctorContacts. Download the .csv file using this link (and save […]
DB Browser for SQLite (it’s also called SQLite Browser for short) is an excellent tool for practicing SQL without having to get connected to a real live server. This post will walk through how to install, open, and use SQLite Browser. Install SQLite Browser Go to the SQLite Browser website and choose the download for […]
Resources for Learning Git Atlassian’s GitFlow Page This is a short tutorial article focused on a typical Git sequences. try.github.io A list of Git Resources, broken down by type.
Grouping in PivotTables is a way of combining data to perform analyses without having to use functions. You can group numeric columns to turn them into categories, you can group date columns by date ranges to get even intervals, and you can group text columns to put together similar values. We’ll go through all three […]
Step 1: Install Anaconda Go to this download webpage on Anaconda’s site. Choose the correct link for your operating system, and then go through the installation process. Step 2: Prepare a folder for notebooks Choose or create a folder on your computer where you will store all Jupyter notebook files. Make sure you choose a place […]
Most people don’t know that bubble plots even exist in Excel. In this blog post, we’ll walk through how to take advantage of these very effective charts! They are great for comparing three quantitative variables at once. For a nice intro to bubble plots, check out Hans Rosling’s very famous Ted Talk. If you don’t […]
Python Resources Driven Data This is a great way to tackle Machine Learning and Python at the same time! I also like Driven Data because it’s a project with a specific goal. I think that’s the best way to learn! Automate the Boring Stuff with Python This is another project-based site that also is […]
Joins in SQL Joins are one of the most important (if not THE most important) concepts in SQL. If you take the time to solidly understand how joins work, you’ll be in an excellent place for writing queries. So, let’s dive in! Join Definitions Joining tables in SQL is a way of combining them. It […]
Multiple joins are one of the toughest SQL concepts – in this post we’ll decode them and review some common pitfalls. One of the best ways to learn is with an example. If you’d like to follow along, you can download this zip file that contains the three tables as .csv files here, and import […]
A list of freely available data on the web. The first list is sites we think are the best for accessing quality datasets. Below that are additional sources by category. Best Sources Kaggle By far our personal favorite! There are dozens if not hundreds of quality datasets available here. ICPSR You have to create an account, […]
Formatting charts in Excel is no easy task. It’s time-consuming, and Excel is pretty fussy which doesn’t make things easier. In this post I’ll give general tips for formatting charts, and also go over a few common scenarios. Understand the Parts of a Chart First thing’s first: it’s important to get the syntax down pat […]
Excel gives you a lot of flexibility when creating files and starting projects, and we’re often asked what the “best” solution is for keeping things organized. This post will review what we recommend. As with everything in life, there may be a few exceptions where you’ll want to set up your file differently. However, this […]
What is SQL? Currently, if you Google this question, you’ll get a whole slew of technical articles that aren’t very helpful for understanding just what SQL is and when people use it. We’ll break that down in this blog post. This is a high level overview – if you want to understand how to actually […]
This year (2018) is the 100th anniversary of a paper by R. A. Fisher, which introduced the statistical term “variance”. Variance is one of the toughest concepts in statistics, but it’s crucially important. Variance tells you how spread out your data are (yep, “are”; the word “data” is plural!). First, let’s get some terminology out of […]