Lexical Bias In Essay Level Prediction
This work addresses essay level prediction for non-native English speakers, but it appears incremental as it focuses on feature engineering and model selection without introducing a new paradigm.
The paper tackled the problem of automatically predicting the level of non-native English speakers from their written essays, achieving state-of-the-art performance in the CAp 2018 data science challenge among 14 systems.
Automatically predicting the level of non-native English speakers given their written essays is an interesting machine learning problem. In this work I present the system "balikasg" that achieved the state-of-the-art performance in the CAp 2018 data science challenge among 14 systems. I detail the feature extraction, feature engineering and model selection steps and I evaluate how these decisions impact the system's performance. The paper concludes with remarks for future work.