2. Correctly structure datasets

We will learn from Chick-fil-A today:

Chick-fil-A and any other fast-food stores do a great job keeping each menu in each bin to serve customers better and effectively.

To create a set menu, they grab each menu from each bin and place them in a tray, rather than creating a separate bin with a set menu.

From the (Chik-fil-A) standpoint, the raw dataset should have each value as a row and should not include multiple headings, or merged cells including the total value.

Set menus (=reports) can be made by pulling each menu from the rack (=raw dataset) into each tray (=different tables or sheets).

Previously on...

In my previous article, I shared why having standardized data/value is important when it comes to data management/analysis.

Even though it doesn’t sound like a ground-breaking rule, you may find a bunch of documents, reports, and fields in your school’s data management system — whether it is Sunapsis, PeopleSoft, Slate, Salesforce, Terra Dotta, etc—that have too many ways to say one thing.

It took more than one year for me to create queries, fields, reports and standardize values to better manage the data in my office. (I use Slate, let me know if you’d like to see details about how I utilized Slate to systemize the process and standardized data)

That being said, the time that I spent building the foundation for better data management/analysis was totally worth it.

It allowed me not only to survive each semester but to save time to do the same job in the future terms and have a better picture of the recruitment, enrollment, retention, and student support programs at my current institution.

"In my humble opinion, professionals (at least in higher education) who know how to work smart are not the ones who always work overtime. Rather they are the ones who know where to invest their time now to come up with the same or even better quality work outcome in the future."

Let’s circle back to the last post, I added the first image to give readers a sneak peek of why the following image is not a good example of data management:

The importance of having correctly structured raw datasets

Spreadsheets or datasets that have multiple headings, merged cells, and multiple values in one cell are not an ideal way to manage the data. What is more, incorrectly structured data will require additional work to perform the data clean-up before data analysis.

To help you understand this better, let’s take a closer look at this photo from Chick-fil-A. Chick-fil-A is always crowded and one of the methods they use to keep up with customers is organizing each menu/sauce in each bin.

Even though I don’t work for Chick-fil-A, I can tell that each sauce, side menu, and chicken sandwich have their own bins. If employees don’t follow the rules and add more than one sauce in one bin, it will cause a delay in sorting it out and may serve wrong orders!

This is what we can see in the image: the Exit column in the image is supposed to be the field to note the “term” when the student graduated/withdrew from the college. Additional information like the graduation date should be stored in a separate column/tab/spreadsheet.

Separating one column into two in Google Spreadsheets and/or Microsoft Excel with few clicks is easy so I am not going to add step-by-step screenshots for now. Please comment down for the detailed step-by-step guide.

Here’s another example: This list of new students who will check in on campus looks great as it is color-blocked by the check-in time.

However, since the check-in time is noted as a merged row not in a separate row under the check-in time column, there is no way to filter students based on the check-in time, and dormitory, just to name a few.

How can we correctly structure the same dataset? From the Chick-fil-A framework, they would grab what they need from each bin and place them in a tray to create a set menu.

"This is what we need to do- create the raw dataset and link it to the report or table that groups/structures the raw data to your liking."

Let me show you another example— The following check in weekly report has multiple merged headings: Date, Total, Undergraduate, and Graduate.

Having multiple headings is like writing more than one menu in the label that is attached to each bin.

In conjunction with the last post’s lesson, this weekly report may serve its needs by providing numbers for that week. That being said, this formatting makes it challenging to analyze the data to see the trend and compare it with previous/future numbers.

How can we correctly structure datasets?

When it comes to revamping the wrong data structure, knowing what should be the correct heading is important. In addition, the correctly changed structure must allow new data to be added as a row (not as a column).

For example, in the weekly report example above, Applied, Accepted, Deposited, Checked in… will be the heading for each column. Then, the date and category (First Year, Transfer and Graduate) numbers will be noted in each row so new data can be added as a new row.

Then, we can create the pivot table based on this updated dataset which will allow us to have a similar look to what we had before but in a better way. Note that Undergraduate = First Year + Transfer numbers. Thus, the undergraduate category will be removed from the raw data and added as a group when we create a report (pivot) table.

The highlight of this process is that when you add new data as a row, the pivot table will be automatically updated and as you filter the date, you will get the weekly numbers automatically which will save you a ton of time.

PowerQuery is my go-to tool to transform the wrong data structure and I will share step-by-step below. Comment down to get the same example file if you’d like to follow along.

How to change multiple headings into one

1. Transform the data into the table. The table will automatically unmerge the column to correctly format the data.

2. You can change the style and name of the table

Click the arrow to see different table styles

3. Click the Data-From Table/Range to launch PowerQuery

4. Check your data see the pattern and fill the empty cells by clicking the Transform- Fill- Up/Down menu to automatically fill the empty cells

*If you have more than two merged headings, you will need to transpose the row/column to have real heading in columns, merge columns into one heading then unpivot the other columns to correctly structure the data.

5. Use first row as headers to have a header. Yay! Now we have a dataset with one heading for each column!

Use first row as a header

6. Since the total value including Undergraduate (= First Year + Transfer) numbers are repeated, repeated rows need to be deleted. To do so, filter the column to show values that do not contain “total” and “undergraduate“

Filter out the total and undergraduate category

7. Right-click the column to rename the columns if needed. I also formatted the date column to show the date without time information

8. To apply the change to the raw data, click the arrow button on the Close & Load then click Close & Load To…

Close and Load to... — Close and Load to…

9. Check Only Create Connection to create the connection for now so we don’t necessarily create queries each time we launch PowerQuery

10. Under the Queries & Connections, right-click the query and click Load To… then Table

11. Ta-da! The previous dataset with merged cells, multiple headings for each column, and repeated rows are now cleaned.

Now let’s add a new dataset and see how this changes the process to pull the weekly check-in report more efficienctly!

How to get numbers for reports automatically

Since the old dataset did not have the correct data structure, to add a new dataset, you had to copy and paste the old table to change the numbers every week and adjust the formulas to get the sum for each column.

With the correctly structured dataset, all you need to do is add the enrollment data in a new row and refresh it. Then the pivot table will automatically give you the updated numbers.

To get this week’s check-in numbers, we are going to create a pivot table based on the raw dataset (a.k.a place each menu in a tray to create a set menu).

How to add new dataset and get numbers automatically

1. Click anywhere in the cleaned-up query and click Insert-Pivot Table

2. Drag and drop Date information to the “Filters” and move each heading to “Values” and Category to “Rows”. You can style the pivot table to your liking.

3. Multi-select First-Year and Transfer then right-click them and select “Group.” You can use F2 on Windows, and Fn+ F2 on Mac to rename the group to say “Undergraduate”

4. Go back to the raw dataset and add a new set of data in a row.

5. Then go back to the pivot table and right-click it and click Refresh

6. You will see the new date is added in a filter and as soon as you click the date, numbers in the table will be updated

To sum up, by using the correctly structured dataset, you can keep the raw data for data management and use separate tables or charts for data analysis.

Let’s recap the Chick-fil-A rule. Keep each sandwich/sauce (data) in each bin (cell) and use only one label (heading) for each bin.

Then, to create a set menu (reports; tables, charts), do not merge bins to create, instead grab each menu from each bin (raw dataset) and place a set menu (reports, tables, charts) in a separate tray.

"In the office, you can do so by maintaining the raw data and linking it to a table/report to organize/group the data for data reporting. "

In my next post, I will share the most important thing to have in every dataset.

Did you find it helpful?

Click the button below to check out all the Professional series!