3. Utilize the primary key and Lookup

It may sound weird but watching Selling Sunset, a Netflix reality show, is my go-to way to clear my head after work.

Every episode is heavily filled with drama between real estate agents in the office and watching them fighting reminds me that I would prefer making peace as much as I can.

From X.com @XtineQuinn

When we work in the office, it’s inevitable to work with others who have a different workflow than you

If there are three people in the office, there could be three different ways of managing information (datasets). 

In this light, knowing the most important thing about data management will help you to increase accuracy when you cross-reference various datasets.

Ultimately, it will help you maintain your mental health and make peace with your colleagues!

Just like the other two previous posts, I talked (typed) a lot in the intro, so I will dive right in: Every dataset should have a unique value (=primary key) which works as an anchor to cross-check and/or match information between two different datasets (=spreadsheets)

By using the anchor (=primary key), you will be able to throw the net using lookup functions to get the information that is tied to the anchor.

To kick off the series, we started with a basic rule about data management. More specifically, we had a look at the importance of having standardized fields/values for spreadsheets, reports, and CRM.

For those who would like to clean up the non-standardized data, the step-by-step guide utilizing Google Spreadsheet, Microsoft Excel, and PowerQuery was shared.

Then, we took a look at the data/report structure. From the “Chick-fil-A” framework, raw data should not have merged rows, more than one piece of information in a cell, and multiple headings. Reporting-style data can be created separately that include total values.

A step-by-step guide to transform the incorrectly structured dataset (=spreadsheets) using PowerQuery was shared as well. Once the raw database is set, eye-pleasing reports can be created using Microsoft Pivot Table.

Let’s look at real-life examples:

To get the on-campus addresses for students, I would need a list from the Residence Life and Housing office and merge it with the dataset that I have.

Similarly, to see who’s graduating this term, I need to merge the report from the Registrar with the current students list that I have.

One of the easiest ways to cross-reference the data is using LOOKUP formulas in Google Spreadsheets and Microsoft Excel. Among the LOOKUP family, my personal favorite is XLOOKUP because of its simplicity of use.

To understand how XLOOKUP works, remember the anchor metaphor: 

“You need an anchor (=primary key) and throw the net where the anchor is and gather information that is attached to the anchor.”

As you can imagine, choosing the right anchor (=primary key) is the most important thing when using the XLOOKUP function. In higher education, students are given student ID and it is the perfect anchor to use. 

In the e-commerce industry, each order is given an order ID, each customer is given a customer ID and each product has SKU numbers- these are the examples of the anchors we need to have in every dataset. 

This is why I would follow up with my colleagues when the dataset does not include student IDs.

If you can’t get the IDs, good practice is to merge two different values to create a unique key (primary key). For example, you can merge the first and last names of the students to have primary keys. Why can’t you use just first name or last name?

Let’s look at the example below and learn more about the primary key.

An example of BLACKPINK University

Here’s an example of BLACKPINK University. Suppose that the dataset on the left is what you manage and the list on the right is what you got from the Res Life

If you use the first name as an anchor, you may miss Rose and Lisa because your list shows a preferred name and the Res Life list shows a legal name.

If you use the last name as an anchor, Jisoo and Jennie’s information can be messed up as both of them have the same last name.

This example illustrates why having the correct datatype of the primary key is important when cross-referencing different datasets. 

How to use XLOOKUP in Microsoft Excel and Google Spreadsheet

1. To use the XLOOKUP function, you will start by typing “=XLOOKUP(“ 

Type "=XLOOKUP"

2. Click the anchor (=primary key) which is a student ID in this example.

Click the primary key

3. Then, throw the net where the anchor is by dragging down column A except for the header. Here, you must make it an absolute reference by using ‘F4’ on the keyboard for PC and ‘fn + F4’ for Mac. I will explain later why this step is necessary.*

Select column that has the primary key

4. Lastly, gather what you need to get (on-campus address) by dragging down column D except for the header. Again, you must make it an absolute reference by using ‘F4’ on the keyboard for PC and ‘Fn + F4’ for Mac. I will explain later why this step is necessary.*

Drag column that has the information you'd like to get

5. When you close the XLOOKUP function, you can get the on-campus address for Jisoo. Note that student IDs here are not numbers, they are texts as you cannot add/subtract student IDs. If your XLOOKUP result shows #N/A, I would recommend double-checking the data type in both datasets and ensuring they are texts. 

XLOOKUP automatically grabs the information tied to the primary key

6. To get the on-campus addresses for other BLACKPINK members, hover over to the edge of the cell and drag it down to activate the auto-fill.

*Auto-fill function automatically updates the function and this is why having the absolute reference is important to look at the same range for all data

Utilize the auto-fill function to fill the data

Unlike PowerQuery, the XLOOKUP formula can be used in Windows, Mac, Microsoft Excel, and Google Spreadsheet so I hope you follow the steps above to use it if you haven’t!

What if you followed the steps in my previous posts to use PowerQuery to automatically get the numbers for reports by placing raw files in a certain network folder?

How can you make sure that any staff member including student workers can open and/or refresh the PowerQuery to do the same job

Check out below to learn how to update the file path of the query which will allow all team members who have access to the network folders to utilize the queries that are built by you! They will appreciate your expertise in data management/analysis. 

How to create the modifiable (=dynamic) file path

How to create the modifiable (=dynamic) file path to share PowerQuery files with anyone who has access to the Shared drive (=Network folders):

1. Place the PowerQuery file and the folder that has a raw database in the shared drive

Move files and folders to the desired place
Move files and folders to the desired place

2. Open the file in which you utilized PowerQuery to automate the data cleanup and/or data analysis. I am using the weekly report file that I used in my previous post.

Create the new tab and copy-paste the following formula to get the current file path “=LEFT(CELL(“filename”,A1),FIND(“\[“,CELL(“filename”,A1)))” *The formula is from Oppadu.com. No worries even if it shows the #VALUE! error.

Enter the formula to get the file path
Enter the formula to get the file path

3. Save the file and you will see that the correct file path is called.

Save the file to see the called file path
Save the file to see the called file path

4. Type the “Current path” on top of the formula and click Insert-Table to create a table. You can rename the table name.

Create the table
Create the table

5. On the next column, create the “Target Path” column. Remember that the PowerQuery file we are using should get the data from the “Weekly report” folder. Type = then click the current path and type “&Weekly report” to get the target path.

Create the target path column
Create the target path column

6. Click Data then From Table/Range to launch the PowerQuery editor

Launch the PowerQuery
Launch the PowerQuery

7. The most recent version of PowerQuery will automatically detect the data type which we don’t need to get the dynamic file path.

Delete the auto-corrected Data Type
Delete the auto-corrected Data Type

8. To remove the data type, go to the Query Settings on the right and simply click the “X” next to the Changed Type.

Delete the Changed Type step
Delete the Changed Type step

9. On the formula box, type [Content][Target Path]{0} to get the formula to get the dynamic file path. [] retrieves the field that you specify and {} retrieves the value of the row you specify. The first row equals 0, not 1.

Call the modifiable file path
Call the modifiable file path

10. Confirm that the formula correctly shows the file path

Confirm the file path
Call the modifiable file path

11. Copy the formula to update the query with this modifiable file path

Copy the formula
Copy the formula

12. Click the query that you’d like to share with your colleagues, then, click Advanced Editor.

Click the "Advanced Editor"
Click the “Advanced Editor”

13. Currently, the query is sourcing the data from the previous folder that is not shared with the team

Current file path
Click the “Advanced Editor”

14. Place the pointer before the Source and type enter to add a variable named “Path” then paste the formula that we copied in step 11. Make sure to type “,at the end of the formula so the code runs without errors

Paste the Excel formula to define the variable "Path"
Paste the Excel formula to define the variable “Path”

15. Delete the previous file path for a variable named “Source” then switch it to “Path” by typing “Path”

Change the file path with the variable "Path"
Create the variable named “Path”

16. When you click Done you may see the error message if you used the sample query to merge the data from the folder in the recent EXCEL versions.

You may see error messages
You may see error messages

17. You can simply repeat the steps. Click the Sample File and Advanced Editor.

Switch the file path for the sample query
Switch the file path for the sample query

18. Click Done and you will see that the Query runs smoothly and you will find that the “Path” step is added successfully on the right navigation.

The step "Path" is added
The step “Path” is added

19. To do the test run, I will rename the folder as “User 020.” You can also create a new folder named “User020” and then move all the files and folders under the folder “User020”.

Test the query
Test the query

20. Open the Weekly report then click Data, and Refresh All. You will be able to confirm that not only the file path but also the datasets are updated successfully. This means any users accessing the Shared Drive can open/edit/refresh the PowerQuery within the file.

Refresh the query to see if it worked!
Refresh the query to see if it worked!

For me, this process cuts around 85% of the time that goes into pulling the weekly report. As this task was repetitive every week, it’s a ton of time you can save per month, per term, and per year!

It may seem challenging when you look at it for the first time, but once you set this up, it does not involve any maintenance. Hope you find this work-changing as much as I did!

 

Next, I will share how to get comparable data to make informed decisions at work.

Did you find it helpful?

Click the button below to check out all the Professional series!

© copyright SEVIS SAVVY 2023