5. Ask questions

In order to make data-driven decisions on international recruitment and enrollment, having comparable data is necessary

Since my institution is under the State University of New York (SUNY) System, I got datasets from data.ny.gov but here are some other sources you might want to look at to get international student enrollment data:

Today, we will look into how to transform a PDF file into a dataset that you can analyze for your needs. It will be a stepping stone to make informed decisions on recruitment/enrollment/retention strategies.

Lay a stepping stone to make data-driven decision

Click the “Previously on…” to be reminded why we are utilizing the PDF files or if you’d like to download the same PDF file and follow along.

In my previous post, I illustrated how to use publicly available data using API to get real-time data.

As a college administrator, I, then, used the data to learn how SUNY categorizes 64 campuses and compare the number of non-resident alien students for colleges that are in the same group as my institution.

Although this dataset was able to give me the enrollment trend for each SUNY institution type, it was not what exactly I was looking for.

If you remember, I explained that non-NY-resident students include not only international students but also other nonimmigrant students or undocumented students.

International enrollment numbers were available as interactive, visualized data by Tableau. The silver lining is that I could save the Tableau as an image or PDF file.

In this post, I will use SUNY Brockport’s Fall 2017 data from the Country of Citizenship of SUNY Students (International): https://tableauserver.suny.edu/t/IRPublic/views/HometownofStudentsFastFacts2022/InternationalOnly

Follow the steps below to transform PDF files into queries using PowerQuery.

How to change PDF files into datasets that you can play with:

1. Click Get Data- From File- From PDF (if you would like to transform multiple PDF files, click Get Data- From From File- From Folder and click the folder that has multiple PDF files)

Get Data-From File-From PDF

2. Choose the file and click Import

Import data

3. Then click Transform Data

Transform Data

4. PowerQuery returns a table with different types of data including a table. Click the row that has the “Table” under the third column named “Kind” then click Add as New Query

Include a table

5. The left image is from a PDF file and the right screenshot is from the query. As you can see all data is correctly exported as a query

Data is exported as a query

6. Utilize Detect Data Type under the Transform tab to correctly change the data type of each column. Updating the data type is necessary to customize the query to your liking and needs

Detect Data Type

7. One thing to keep in mind is that the “Total” value needs to be deleted as we don’t need it in the raw database. To learn more about this, go to the Learn from Chik-fil-A post.

To remove total values, right-click the column, click Text Filters- Does Not Contain and type “Total

Does not contain the text "Total"

Voilà! You get the query from the PDF file and you can use PivotTable and PivotChart to analyze the data.

It’s worth noting that to see the enrollment trend though, more than two datasets from different terms and campuses are needed. For this reason, term and campus information needs to be added to the database.

Adding the campus and term information is easy if you launch PowerQuery based on data from a folder as the file name is automatically added as a column. 

However, if you get data directly from a PDF file, the file path needs to be added manually. Click the following to learn how to add the file path as a column manually step-by-step:

How to add the file path as a column manually using Advanced Editor:

1. Click Home- Advanced Editor

Advanced Editor
Advanced Editor

2. Add a column that shows the file path

  • Create the “Path” variable and copy and paste the file path. Don’t forget to close the clause with a comma (,)
  • Go to the Source variable and switch the file path to the variable “Path
  • Copy and paste the following: #”Add Columns” = Table.AddColumm(Table1, “Term”, each Path)
  • Delete the previous call under the “in” and add type #”Add Column” to call the added column
Add a column that shows the file path
Add a column that shows the file path

3. Now the Term column is added but since the entire file path is added to the column, we will need to trim it.

To do so, click Home-Split Column-By Positions and specify the position to have “Fall 2017 Brockport.pdf”

Split columns
Split columns

4. Click Home- Replace Values to remove “.pdf” at the end

Replace values
Replace values

5. To get what we need— term and the campus information, “Fall 2017 Brockport” needs to be split into two

6. Click Home-Split Column-By Number of Characters to split the values. As you can see in the image below, it will split the column after the 9th value, only once from the left

Split columns by number of characters
Split columns by number of characters

7. Remember that there is a space between Fall 2017 and Brockport. Utilize the Replace Values to trim the space then rename the column.

Rename the column
Rename the column

8. Multi-select two columns— Term and Campus— then move them to the beginning of the query

Move two columns
Move two columns

9. Ta-da the raw dataset of Fall 2017 Brockport international enrollment data is ready to be analyzed!

The query is ready
The query is ready

10. You can also rename the query name by right-clicking the query on the left navigation bar

Rename the query
Rename the query

Yay- it’s actually awesome that we can transform the PDF file to a query that we can customize, but, let’s remember that it is actually the first step

With the raw dataset, we can utilize PivotTable or PivotChart to compare numbers with local data in the office. Then, ask questions to reflect and plan for better recruitment and/or retention strategies. (Refer to the previous post titled to learn how to use PivotTable.

Here are some examples of the questions that I asked myself to start the data-driven decision-making process for my office. 

  • Does the enrollment trend at my school (=local data) reflect the SUNY trend per institution type? If not, what are the unique characteristics of the local data?
  • If we outperformed in certain academic years, what did we do differently? For example, what are the sending countries or education levels that helped us have more enrollment numbers?
  • If we underperformed in certain academic years, what can we learn from other colleges? For example, what are the sending countries or education levels where other colleges have more students than us?

"Does the enrollment trend at my school (=local data) reflect the SUNY trend per institution type? If not, what are the unique characteristics of the local data?"

"I hope you can all see how having the data management/analysis mindset is for all higher education professionals regardless of the title. "

Additionally, I hope the step-by-step guides in each post helped you to think “It’s not as scary as ‘lookup’ or ‘PowerQuery’ or ‘PivotTable’ sound,” just like my students told me. 

A new Professional series will be back with Slate 101 tips for university staff but please feel free to comment if there’s anything you’d like me to touch upon when it comes to higher education administration/data analysis.

Did you find it helpful?

Click the button below to check out all the Professional series!

© copyright SEVIS SAVVY 2023