Table of Contents
Research question
How strongly does retention rate predict completion rate across different racial/ethnic groups in higher education, and how does this relationship vary by institutional type (4-year institutions vs. less than 4-year institutions)?
Model
Linear regression analysis using R; This study employs linear regression analysis using R to examine the relationship between retention rate and completion rate across different racial/ethnic groups and institutional types.
Factors
- Independent variables (X): Retention rate, Race/Ethnicity
- Dependent variable (Y): Completion rate
- Analysis approach: Separate linear regression models were conducted for 4-year and less-than-4-year institutions to assess differences in the relationship between retention rate and completion rate across racial/ethnic groups
Data source
College Scorecard API: Click here to review the API Documentation
Selected Data Elements from College Scorecard
Name of Data Element | Developer-friendly name | API data type |
---|---|---|
First-time, full-time student retention rate at four-year institutions | retention_rate.four_year.full_time_pooled | float |
First-time, full-time student retention rate at less-than-four-year institutions | retention_rate.lt_four_year.full_time_pooled | float |
First-time, part-time student retention rate at four-year institutions | retention_rate.four_year.part_time_pooled | float |
First-time, part-time student retention rate at less-than-four-year institutions | retention_rate.lt_four_year.part_time_pooled | float |
Completion rate for first-time, full-time students at four-year institutions (150% of expected time to completion) for white students | completion_rate_4yr_150_white | float |
Completion rate for first-time, full-time students at four-year institutions (150% of expected time to completion) for black students | completion_rate_4yr_150_black | float |
Completion rate for first-time, full-time students at four-year institutions (150% of expected time to completion) for Hispanic students | completion_rate_4yr_150_hispanic | float |
Completion rate for first-time, full-time students at four-year institutions (150% of expected time to completion) for Asian students | completion_rate_4yr_150_asian | float |
Completion rate for first-time, full-time students at four-year institutions (150% of expected time to completion) for American Indian/Alaska Native students | completion_rate_4yr_150_aian | float |
- Few things to note about the dataset
- Number of Datasets: As of March 2025, the dataset contains data for approximately 6,400+ institutions. Since the data is connected via API, the number of rows may change as the dataset is updated.
- Full-time students only: The analysis is based on full-time student data, as completion rates for part-time students are not available in the College Scorecard dataset.
- Data collection period: As of March 2025, the data was last updated on January 16, 2025. Thus, the following visualized datasets reflect the data available as of that update. To refresh the analysis, download the .pbix file from my GitHub and hit . The most recent date update can always be found here: https://collegescorecard.ed.gov/data/
- 150% completion rate: In the context of a 4-year institution, 150% completion rate refers to students who graduate within 6 years (150% of the standard 4-year program length).
Regression Analysis and Data Visualization as of March 2025
R script: Click here to view the R script on my GitHub page
Linear Regression Table
- Race_Ethnicity: The dependent variable (completion rate) grouped by race/ethnicity
- Retention Rate (Predictor): The predictor (independent variable) used in the regression model:
- Baseline Completion Rate (if Retention=0) == Intercept: The expected completion rate when the retention rate is zero (not usually meaningful but part of the regression equation). It is not an independent variable; it’s a baseline value.
- FT Retention Rate at 4yr Institutions: The coefficient that represents the change in completion rate for every 1-unit increase in the retention rate.
- Predicted Completion Rate= Baseline Completion Rate (if Retention=0) + FT Retention Rate at 4yr Institutions * FT AVG Retention Rate (which is 71.8% for four-year institutions and 69.67% for less than four-year institutions)
- Coefficient: How much the dependent variable (completion rate) is expected to change for each 1 percentage point increase in the independent variable (retention rate).
- Predicted Completion Rate (if Retention=0): not always meaningful on its own; it is used to calculate Predicted Completion Rate
- FT Retention Rate at 4yr Institutions: for every 1% increase in retention rate, the completion rate increases by the value. If the coefficient is closer to 1 or greater than 1, it suggests a strong positive relationship, meaning that as retention rate increases, completion rate increases at a similar or even greater rate. (*Note: If the coefficient is negative, it indicates an inverse relationship. e.g. as retention increases, completion decreases).
- p_value: How the independent variable’s impact on the dependent variable is statistically meaningful. If p < 0.05, it can be interpreted that there’s strong evidence that retention rate affects completion rate (statistically significant). If p ≥ 0.05, there’s weak or no evidence of a real effect (not statistically significant).
- R_squared: How well the independent variable explains the variability in the dependent variable. If the value is closer to 1, it means the model explains most of the variance in completion rates (stronger explanatory power). If the value is closer to 0, other factors not included in the model are influencing completion rates (weaker explanatory power). For educational data, an R² of 0.3 to 0.5 is typically considered moderate, while above 0.6 is strong.
4-year institutions

- Overall, the retention rate has a positive impact on the completion rate for all racial groups. Given that the R-squared of 0.3 to 0.5 is typically considered moderate for educational data, races that the model explains the variability well are 2 or More races, Black and White students.
- The impact of retention rate on completion rate is the highest for NHPI students with 85.76% predicted completion rate and coefficient of 1.20, followed by 2 or More students (78.68%, 1.10) and unknown students (76.57%, 1.07).
- White students have the lowest coefficient (0.84), meaning retention rate has a comparatively smaller effect on completion for them.
- P_value for all races is about 0.00 which can be interpreted that the impact of retention rate on completion rate is statistically significant across all races. It is worth noting that p_value of 0.00 may be due to the large data size (about 6500 institutions).
Less than 4-year institutions

- Although the average predicted completion rate is lower compared to 4-year institutions, the retention rate has a positive impact on the completion rate for all racial groups who are enrolled at less than 4-year institutions. It is worth noting that the R-squared values for all racial groups are lower than 0.3; it can be interpreted that for students at less than 4-year institutions, there are more factors other than the retention rate that affect the completion rate.
- From the predicted completion rate, NHPI students have the highest predicted completion rate (63.16%, coefficient of 0.91), which is the same as the 4-year institutions. The second and third-highest predicted completion rates were of Hispanic and White students (61.18%, 0.88, and 57.09%, 0.82 respectively).
- Same as the P_values for students enrolled at 4-year institutions, students who study at less than 4-year institutions have a p_value of 0.00. Again, this may be due to the large data size (N= 6,480+).
Visualized plots for each race
- What do the black dots represent?
- Each dot represents a college or university.
- The position of a dot on the graph shows:
- The x-axis (horizontal): The average first-year retention rate at that institution (how many students return after their first year).
- The y-axis (vertical): The completion rate for that student group at the institution (how many eventually graduate).
- What does the colored line mean
- The line shows the general trend between retention and completion rates.
- A stepper line suggests a stronger relationship: Institutions with higher retention rates tend to have higher completion rates for that student group.
4-year institutions

Less than 4-year institutions

Findings
Comparison between students enrolled at 4-year institutions and less than 4-year institutions

- The average predicted completion rate for students who are enrolled at 4-year institutions (72.06%) is 18.27 percent points higher than students enrolled at less than 4-year institutions (53.79%).
- Two racial groups that show the largest gap in the predicted completion rates between 4-year and less than 4-year institutions are AIAN and Unknown students (33.34 percent points difference and 32.69 percent points difference respectively). This suggests that AIAN and Unknown students may face greater barriers to completion at less than 4-year institutions compared to other racial groups.
- On the other hand, the two racial groups that had the least differences in the predicted completion rates were Hispanic and White students (3.29 percent points difference and 3.19 percent points difference respectively). This could indicate that these groups experience more similar outcomes regardless of institutional type, though further exploration would be needed to understand why.

Takeaways for college professionals
- Retention Rate Matters but Is Not the Only Factor:
- Retention rate has a statistically significant effect on completion rate (p-value~0.00). However, low R-squared values indicate that retention alone does not strongly explain completion rates. This means there are other key factors missing that affect completion rates. (look at the limitations/suggestions for future research).
- Targeted Support for NHPI Students May Have a High Impact
- Among racial groups, NHPI students show the highest predicted completion rates when retention improves regardless of the institution type.
- Institutional interventions aimed at improving NHPI student retention—such as culturally responsive advising, mentoring, and academic support—are likely to significantly boost their completion rates.
- Completion at Less-Than-4-Year Institutions Is More Complex
- The lower R-squared values suggest that completion at less-than-4-year institutions is influenced by multiple factors beyond retention.
Limitations/suggestions for future analysis
- Full-Time vs. Part-Time Students
- The analysis is based on full-time student data, as completion rates for part-time students are not available in the College Scorecard dataset.
- If future data includes part-time student completion rates, a more comprehensive analysis could assess whether retention impacts completion differently for part-time vs. full-time students.
- Geographic Differences
- This analysis does not account for regional variations in the relationship between retention and completion rates.
- Future research could explore state-level differences to determine whether geographic differences influence completion outcomes.
© copyright SEVIS SAVVY 2025