CYCLLGAPSep 12, 2025

Predicting First Year Dropout from Pre Enrolment Motivation Statements Using Text Mining

arXiv:2509.16224v11 citationsh-index: 2
Originality Synthesis-oriented
AI Analysis

This addresses the challenge of early dropout prediction in higher education, but it is incremental as it shows text data can match existing predictors without enhancing them.

The study tackled predicting first-year university dropout by analyzing pre-enrollment motivation statements using text mining, finding that text analysis alone predicted dropout similarly well as traditional student characteristics, with no improvement from combining both.

Preventing student dropout is a major challenge in higher education and it is difficult to predict prior to enrolment which students are likely to drop out and which students are likely to succeed. High School GPA is a strong predictor of dropout, but much variance in dropout remains to be explained. This study focused on predicting university dropout by using text mining techniques with the aim of exhuming information contained in motivation statements written by students. By combining text data with classic predictors of dropout in the form of student characteristics, we attempt to enhance the available set of predictive student characteristics. Our dataset consisted of 7,060 motivation statements of students enrolling in a non-selective bachelor at a Dutch university in 2014 and 2015. Support Vector Machines were trained on 75 percent of the data and several models were estimated on the test data. We used various combinations of student characteristics and text, such as TFiDF, topic modelling, LIWC dictionary. Results showed that, although the combination of text and student characteristics did not improve the prediction of dropout, text analysis alone predicted dropout similarly well as a set of student characteristics. Suggestions for future research are provided.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes