It may sound weird but watching Selling Sunset, a Netflix reality show, is my go-to way to clear my head after work.
Every episode is heavily filled with drama between real estate agents in the office and watching them fighting reminds me that I would prefer making peace as much as I can.
When we work in the office, it’s inevitable to work with others who have a different workflow than you
If there are three people in the office, there could be three different ways of managing information (datasets).
In this light, knowing the most important thing about data management will help you to increase accuracy when you cross-reference various datasets.
Ultimately, it will help you maintain your mental health and make peace with your colleagues!
Just like the other two previous posts, I talked (typed) a lot in the intro, so I will dive right in: Every dataset should have a unique value (=primary key) which works as an anchor to cross-check and/or match information between two different datasets (=spreadsheets).
By using the anchor (=primary key), you will be able to throw the net using lookup functions to get the information that is tied to the anchor.
To kick off the series, we started with a basic rule about data management. More specifically, we had a look at the importance of having standardized fields/values for spreadsheets, reports, and CRM.
For those who would like to clean up the non-standardized data, the step-by-step guide utilizing Google Spreadsheet, Microsoft Excel, and PowerQuery was shared.
Then, we took a look at the data/report structure. From the “Chick-fil-A” framework, raw data should not have merged rows, more than one piece of information in a cell, and multiple headings. Reporting-style data can be created separately that include total values.
A step-by-step guide to transform the incorrectly structured dataset (=spreadsheets) using PowerQuery was shared as well. Once the raw database is set, eye-pleasing reports can be created using Microsoft Pivot Table.
Let’s look at real-life examples:
To get the on-campus addresses for students, I would need a list from the Residence Life and Housing office and merge it with the dataset that I have.
Similarly, to see who’s graduating this term, I need to merge the report from the Registrar with the current students list that I have.
One of the easiest ways to cross-reference the data is using LOOKUP formulas in Google Spreadsheets and Microsoft Excel. Among the LOOKUP family, my personal favorite is XLOOKUP because of its simplicity of use.
To understand how XLOOKUP works, remember the anchor metaphor:
“You need an anchor (=primary key) and throw the net where the anchor is and gather information that is attached to the anchor.”
As you can imagine, choosing the right anchor (=primary key) is the most important thing when using the XLOOKUP function. In higher education, students are given student ID and it is the perfect anchor to use.
In the e-commerce industry, each order is given an order ID, each customer is given a customer ID and each product has SKU numbers- these are the examples of the anchors we need to have in every dataset.
This is why I would follow up with my colleagues when the dataset does not include student IDs.
If you can’t get the IDs, good practice is to merge two different values to create a unique key (primary key). For example, you can merge the first and last names of the students to have primary keys. Why can’t you use just first name or last name?
Let’s look at the example below and learn more about the primary key.
Here’s an example of BLACKPINK University. Suppose that the dataset on the left is what you manage and the list on the right is what you got from the Res Life.
If you use the first name as an anchor, you may miss Rose and Lisa because your list shows a preferred name and the Res Life list shows a legal name.
If you use the last name as an anchor, Jisoo and Jennie’s information can be messed up as both of them have the same last name.
This example illustrates why having the correct datatype of the primary key is important when cross-referencing different datasets.
How to use XLOOKUP in Microsoft Excel and Google Spreadsheet
1. To use the XLOOKUP function, you will start by typing “=XLOOKUP(“
2. Click the anchor (=primary key) which is a student ID in this example.
3. Then, throw the net where the anchor is by dragging down column A except for the header. Here, you must make it an absolute reference by using ‘F4’ on the keyboard for PC and ‘fn + F4’ for Mac. I will explain later why this step is necessary.*
4. Lastly, gather what you need to get (on-campus address) by dragging down column D except for the header. Again, you must make it an absolute reference by using ‘F4’ on the keyboard for PC and ‘Fn + F4’ for Mac. I will explain later why this step is necessary.*
5. When you close the XLOOKUP function, you can get the on-campus address for Jisoo. Note that student IDs here are not numbers, they are texts as you cannot add/subtract student IDs. If your XLOOKUP result shows #N/A, I would recommend double-checking the data type in both datasets and ensuring they are texts.
6. To get the on-campus addresses for other BLACKPINK members, hover over to the edge of the cell and drag it down to activate the auto-fill.
*Auto-fill function automatically updates the function and this is why having the absolute reference is important to look at the same range for all data
Unlike PowerQuery, the XLOOKUP formula can be used in Windows, Mac, Microsoft Excel, and Google Spreadsheet so I hope you follow the steps above to use it if you haven’t!
What if you followed the steps in my previous posts to use PowerQuery to automatically get the numbers for reports by placing raw files in a certain network folder?
How can you make sure that any staff member including student workers can open and/or refresh the PowerQuery to do the same job?
Check out below to learn how to update the file path of the query which will allow all team members who have access to the network folders to utilize the queries that are built by you! They will appreciate your expertise in data management/analysis.
How to create the modifiable (=dynamic) file path
How to create the modifiable (=dynamic) file path to share PowerQuery files with anyone who has access to the Shared drive (=Network folders):
1. Place the PowerQuery file and the folder that has a raw database in the shared drive
2. Open the file in which you utilized PowerQuery to automate the data cleanup and/or data analysis. I am using the weekly report file that I used in my previous post.
Create the new tab and copy-paste the following formula to get the current file path “=LEFT(CELL(“filename”,A1),FIND(“\[“,CELL(“filename”,A1)))” *The formula is from Oppadu.com. No worries even if it shows the #VALUE! error.
3. Save the file and you will see that the correct file path is called.
4. Type the “Current path” on top of the formula and click Insert-Table to create a table. You can rename the table name.
5. On the next column, create the “Target Path” column. Remember that the PowerQuery file we are using should get the data from the “Weekly report” folder. Type = then click the current path and type “&Weekly report” to get the target path.
6. Click Data then From Table/Range to launch the PowerQuery editor
7. The most recent version of PowerQuery will automatically detect the data type which we don’t need to get the dynamic file path.
8. To remove the data type, go to the Query Settings on the right and simply click the “X” next to the Changed Type.
9. On the formula box, type [Content][Target Path]{0} to get the formula to get the dynamic file path. [] retrieves the field that you specify and {} retrieves the value of the row you specify. The first row equals 0, not 1.
10. Confirm that the formula correctly shows the file path
11. Copy the formula to update the query with this modifiable file path
12. Click the query that you’d like to share with your colleagues, then, click Advanced Editor.
13. Currently, the query is sourcing the data from the previous folder that is not shared with the team
14. Place the pointer before the Source and type enter to add a variable named “Path” then paste the formula that we copied in step 11. Make sure to type “,” at the end of the formula so the code runs without errors
15. Delete the previous file path for a variable named “Source” then switch it to “Path” by typing “Path”
16. When you click Done you may see the error message if you used the sample query to merge the data from the folder in the recent EXCEL versions.
17. You can simply repeat the steps. Click the Sample File and Advanced Editor.
18. Click Done and you will see that the Query runs smoothly and you will find that the “Path” step is added successfully on the right navigation.
19. To do the test run, I will rename the folder as “User 020.” You can also create a new folder named “User020” and then move all the files and folders under the folder “User020”.
20. Open the Weekly report then click Data, and Refresh All. You will be able to confirm that not only the file path but also the datasets are updated successfully. This means any users accessing the Shared Drive can open/edit/refresh the PowerQuery within the file.
For me, this process cuts around 85% of the time that goes into pulling the weekly report. As this task was repetitive every week, it’s a ton of time you can save per month, per term, and per year!
It may seem challenging when you look at it for the first time, but once you set this up, it does not involve any maintenance. Hope you find this work-changing as much as I did!
Next, I will share how to get comparable data to make informed decisions at work.
Did you find it helpful?
Click the button below to check out all the Professional series!
© copyright SEVIS SAVVY 2023