LCP-RIT at SemEval-2021 Task 1: Exploring Linguistic Features for Lexical Complexity Prediction
This work addresses lexical complexity prediction for natural language processing applications, but it is incremental as it applies existing methods to a new dataset without introducing major innovations.
The paper tackled the problem of predicting lexical complexity of single words in English text using a dataset annotated on a five-point scale, and the result was a system based on logistic regression with various linguistic features that was evaluated using metrics like mean absolute error and Pearson correlation, though no specific numbers were provided.
This paper describes team LCP-RIT's submission to the SemEval-2021 Task 1: Lexical Complexity Prediction (LCP). The task organizers provided participants with an augmented version of CompLex (Shardlow et al., 2020), an English multi-domain dataset in which words in context were annotated with respect to their complexity using a five point Likert scale. Our system uses logistic regression and a wide range of linguistic features (e.g. psycholinguistic features, n-grams, word frequency, POS tags) to predict the complexity of single words in this dataset. We analyze the impact of different linguistic features in the classification performance and we evaluate the results in terms of mean absolute error, mean squared error, Pearson correlation, and Spearman correlation.