Excel File Setup for Analysis
Excel File Setup for Analysis
by boxplotanalytics Sep 1, 2019 1:51 am
Excel gives you a lot of flexibility when creating files and starting projects, and we’re often asked what the “best” solution is for keeping things organized. This post will review what we recommend. As with everything in life, there may be a few exceptions where you’ll want to set up your file differently. However, this guide should serve you well for the vast majority of your projects.
Worksheet 1: Raw Data
The first worksheet (or “tab”) in your Excel file should be your raw data. And unless you have a specific reason to call it something else, we usually actually name the worksheet “Raw Data”. You should not modify the data in this sheet at all. Leave it exactly as you received it, always.
Worksheet 2: Cleaned Data
The second worksheet in your Excel file is the cleaned data. Again, we recommend naming this worksheet “Cleaned Data”.
To create this worksheet, copy and paste everything from the Raw Data sheet into this sheet. (Alternatively, right-click on the Raw Data sheet and choose “Make a Copy”). Then, without highlighting, create a Table. You can read more about how to create tables in our other blog post here.
After creating the table, you may begin making modifications to the data to clean it. Removing unnecessary rows, adding calculated columns and formulas, etc. Formulas should only be used if you are creating an entire new column from the formula (see next section for why).
Worksheet 3: Analysis
This worksheet, named “Analysis” will contain all of your PivotTables and calculations. While it may be tempting to put PivotTables, calculations, etc. in the Cleaned Data worksheet, we don’t consider this to be the best practice. Especially when the calculations/PivotTables are put below the dataset – this can cause serious problems later on when you are trying to select data for analyses and visualizations. It’s best to keep all of the analysis work separate from the datasets.
As mentioned above, the only exception is when you are using a formula to create an entirely new column in the Cleaned Data sheet. Those formulas can stay in the Cleaned Data sheet. All other formulas used for calculations, though, that don’t become columns in your dataset should be in the Analysis sheet and separate.
Worksheet 4: Dashboard/Presentation
This is the last worksheet you would need for a simple project (the next section after this one is only required for more complex projects). We recommend naming this worksheet “Dashboard” (or “Presentation” if it’s not really a dashboard).
If you are dealing with a large dataset or a large project, you may need to repeat the “Analysis” and “Dashboard/Presentation” tabs. For example, say you’re working with data from your company’s Facebook page. Facebook gives you over 1800 variables (columns). It would be impossible to squeeze all of this information into one presentation/dashboard effectively. And if you tried to do all of your analysis on one worksheet, that would get very messy. Instead, you might have one analysis sheet just for “Engagement” data, and one dashboard sheet for “Engagement” data. Then another analysis worksheet for “Impressions” data, and a corresponding dashboard worksheet for “Impressions” data. And so on and so forth.
So, the final worksheets in your file might look something like this (in this order):
You can even color-code the worksheets so that you can easily identify them – for example, the Engagement analysis and Engagement dashboard sheets could be green, the Impressions analysis and Impressions dashboard sheets blue, etc. Or, color coding could be done by analysis vs dashboard sheets – for example, all analysis sheets could be green and all dashboard sheets blue.
It’s also somewhat common practice to hide the analysis sheets if you plan on passing off the file to colleagues who aren’t analysts. This way they know to focus on the presentation/dashboard sheets only.
We hope this guide serves as a helpful framework for setting up your next analytics project! As mentioned previously, it may not be the best solution for every project, but it will most likely be a usable model for most projects.