Decoding the Alphabet Soup of Degrees in the United States Postsecondary Education System Through Hybrid Method: Database and Text Mining
It addresses the need for automated classification of degree levels in education data to model student success and mobility, representing an incremental improvement over manual methods.
This paper tackles the problem of predicting postsecondary degree levels from ambiguously expressed abbreviations in student tracking reports, achieving 97.83% accuracy with a hybrid model combining database lookup and text mining.
This paper proposes a model to predict the levels (e.g., Bachelor, Master, etc.) of postsecondary degree awards that have been ambiguously expressed in the student tracking reports of the National Student Clearinghouse (NSC). The model will be the hybrid of two modules. The first module interprets the relevant abbreviatory elements embedded in NSC reports by referring to a comprehensive database that we have made of nearly 950 abbreviations for degree titles used by American postsecondary educators. The second module is a combination of feature classification and text mining modeled with CNN-BiLSTM, which is preceded by several steps of heavy pre-processing. The model proposed in this paper was trained with four multi-label datasets of different grades of resolution and returned 97.83\% accuracy with the most sophisticated dataset. Such a thorough classification of degree levels will provide insights into the modeling patterns of student success and mobility. To date, such a classification strategy has not been attempted except using manual methods and simple text parsing logic.