Learning Syntactic Dense Embedding with Correlation Graph for Automatic Readability Assessment
This work addresses readability assessment for educational or text analysis applications, but it is incremental as it builds on existing deep learning approaches by integrating traditional linguistic features.
The authors tackled the problem of automatic readability assessment by incorporating linguistic features into neural network models, learning syntactic dense embeddings using a correlation graph, and achieved significantly better performances compared to BERT-only models across six datasets at two proficiency levels.
Deep learning models for automatic readability assessment generally discard linguistic features traditionally used in machine learning models for the task. We propose to incorporate linguistic features into neural network models by learning syntactic dense embeddings based on linguistic features. To cope with the relationships between the features, we form a correlation graph among features and use it to learn their embeddings so that similar features will be represented by similar embeddings. Experiments with six data sets of two proficiency levels demonstrate that our proposed methodology can complement BERT-only model to achieve significantly better performances for automatic readability assessment.