4. Get comparable data

Throughout the Professional series, we had a look at the importance of standardizing data entry and having the primary key (anchor) in each dataset. 

Standardized data values allow practitioners to listen to what data tells us. Similarly, the primary key helps us to cross-reference datasets efficiently which could help us make peace with colleagues as much as possible! 

In the second post, we also took a look at how Chick-fil-A works to correctly structure the data (table). Correctly structured data do not have multiple headings, merged cells, or more than one value in a cell. 

Based on the correctly structured data, tables and charts can be created which can reflect any updates in the raw database. 

Now we are ready to fly even higher!

How to utilize public data to come out of the well and see a bigger sky

Have you ever heard of the frog in the well story? The frog thought the sky was the size of the well. It isn’t until the frog jumps out of the well that it realizes the sky is much bigger than what it saw.

Data is like that. Having your organization’s data and comparing it with bigger data can tell you more than just looking at your local data.  

For example, if you are a higher education professional like me and if your school is one of the campuses in the state system, you can refer to the state system data to compare various data points.

Additionally, if you are in the International Education department like me, you may want to look at the Open Door data or SEVIS by Numbers to get insights on international recruitment, enrollment, and retention.

Likewise, if you are selling the book titled “The Frog in the Well,” you may want to look at the order data in the e-commerce platform to see how well your book is selling. 

In short, looking at the bigger data helps you to make data-driven decisions.

In today’s post, I will show you how to analyze various datasets using API, and/or PDF files. Specifically, we will look at the State University of New York (SUNY) data that is publicly available

For those who are not familiar with the SUNY system, SUNY is comprised of 64 campuses and it can be categorized into 4 different types of institutions:

  1. Community Colleges
  2. Comprehensive Colleges
  3. Doctoral Degree Granting Institutions
  4. Technology Colleges

How do I know all of this? I downloaded the report from data.ny.gov and utilized the pivot table to see the list of institutions in each category. Click the Learn More button below to learn step-by-step:

1. Go to data.ny.gov to get the “CSV for Excel” file of Headcount Enrollment by Student Level and Student Load by Institutions of the State University of New York: Beginning Fall 2011

Headcount Enrollment by Student Level and Student Load by Institutions of the State University of New York: Beginning Fall 2011
Headcount Enrollment by Student Level and Student Load by Institutions of the State University of New York: Beginning Fall 2011

2. Click the Insert- Pivot Table to add a Pivot Table. From there, drag College or Institution Type to the Field, College or Institution Name to the Rows, Year to Columns then Undergraduate Full-Time to Values then set the values as Average.

Here is the list of SUNY campus names in each institution type:

Community Colleges

A list of Community Colleges
A list of Community Colleges

Comprehensive Colleges

A list of Comprehensive Colleges
A list of Comprehensive Colleges

Doctoral Degree Granting Institutions

A list of Doctoral Degree Granting Institutions
A list of Doctoral Degree Granting Institutions

Technology Colleges

A list of Technology Colleges
A list of Technology Colleges

The reason why we looked into each institution type is that we don’t want to compare community college numbers with flagship universities’ numbers. If we do, we will get skewed views of the numbers.

Unfortunately, New York State is not allowing direct access to the dataset of international student enrollment numbers. The public can only play with visualized interactive data using Tableau and download it as PDF files.

We will take a look at how to download them as PDF files and then use PowerQuery to get the data that we need in the next post.

How to get datasets using API

For now, we will utilize the report named State University of New York (SUNY) Trends in Enrollment of Students by Race/Ethnicity and by SUNY Sector: Beginning Fall 2002 using API to get the non-resident alien percentage per each institutional type.

*Non-resident aliens may include international students and other nonimmigration visa holders.

Pulling the % of nonresident aliens for 4 different SUNY campus types will give us a base to compare it with local data to see if my school has a similar % of international students compared to other schools in the same category. 

1. Go to the report, click API, and copy the JSON API Endpoint

State University of New York (SUNY) Trends in Enrollment of Students by Race/Ethnicity and by SUNY Sector: Beginning Fall 2002
State University of New York (SUNY) Trends in Enrollment of Students by Race/Ethnicity and by SUNY Sector: Beginning Fall 2002

2. There are two ways to call API

  • Click Data- Get Data- From File – From JSON
  • Click Data- From Web
Get data from JSON
Get data from JSON

3. Paste the API endpoint

  • JSON: Paste the endpoint next to the File name and click Import
  • Web: Paste the endpoint, select the URL and click Connect
JSON: Paste the link in the file name field
JSON: Paste the link in the file name field
Click the final URL
Click the final URL

4. API called a list of records

A list of records
A list of records

5. Under the Transform menu, click To Table

Transform the list to table
Transform the list to a table

6. Use the dropdown menu to select what you need then click OK

Finish the set up
Finish the setup

7. Click the extension icon next to the column name

Click the extension icon
Click the extension icon

8. Make sure to uncheck the Use original column name as prefix and click OK

Export values
Export values

9. Click anywhere in the table and click Detect Data Type to change the data type of each column. Choosing the correct data type is a necessary step to use the pivot table.

Detect Data Type
Detect Data Type

10. If the data type is not automatically detected, click the icon before the column name and select Whole Number as the datasets are numbers

Manually change the data type
Manually change the data type

11. Click Close & load to then add the query as a table

Close & Load to table
Close & Load to table

12. Now that we got the dataset from the NYS government website using API, we will customize it to see non-resident alien enrollment numbers

13. Click Insert- Pivot Table to launch the Pivot Table

Launch the Pivot Table
Launch the Pivot Table

14. A pop-up window will automatically select the table range. You can set the location for the Pivot Table, I will add it to the New Worksheet

Add the Pivot Table
Add the Pivot Table

15. Move “institution_sector” to Filter, “term” to Rows, and “total_enrollment” and “non_resident_alien” to Values

Create the Pivot Table
Create the Pivot Table

16. To get the % of non-New York residents compared to the total enrollment, go to PivotTable Analyze- Fields, Items & Sets and then click Calculated Field

Add the Calculated Field
Add the Calculated Field

17. Type the Name of the Field and click each field to create the %. When you are done click OK

Use the calculated field
Use the calculated field

18. You can also add a Pivot Chart by clicking PivotTable Analyze- Pivot Chart. I went with a Combo chart style to show total enrollment numbers and non-resident alien numbers as bar graphs and % of non-NY resident students as a line graph. To do so, you need to check the Secondary Axis for % of students field.

Pivot Chart
Pivot Chart

SUNY enrollment data from Fall 2002 to Fall 2022

Here are the analyzed SUNY data from Fall 2002 to Fall 2022 that show the percentage of non-New York resident students:

Community Colleges

Community Colleges

Comprehensive Colleges

Comprehensive Colleges

Doctoral Degree Granting Institutions

Doctoral Degree Granting Institutions

Technology Colleges

Technology Colleges

In summary, PowerQuery can be used to get the real time datasets that are open to the public. Pivot Table is helpful for grouping and organizing data to see what I need. Last but not least, a Pivot Chart can be used to visualize the data to see the trend. 

Here are some of the findings from the Pivot Table and Pivot Chart:

  • Doctoral Degree Granting Institutions have the highest number of enrolled non-resident-alien students with an average of 12.85% of total enrolled students since the Fall 2002 term, followed by Comprehensive Colleges, Technology Colleges, and Community Colleges
  • None of the SUNY institution types have a similar trend of total enrolled NY resident students and enrolled non-resident students.
  • The percentage of enrolled non-NY-resident students at SUNY Technology Colleges decreased the most rapidly.
  • Technology and Community Colleges recovered the percentage of enrolled non-resident students since the pandemic
  • Doctoral Degree Granting Institutions have seen a somewhat steady increase in the percentage of enrolled non-resident-aliens
  • The total enrollment decreased in the past 3 years regardless of the institution type

As I mentioned previously, this is not exactly what my office wanted to see as non-resident aliens include non-immigrant-visa holders other than student visa holders.

Therefore, in my next post, I will share how to convert PDF files to datasets using PowerQuery to get a closer look at the SUNY international enrollment data.

Did you find it helpful?

Click the button below to check out all the Professional series!

© copyright SEVIS SAVVY 2023