CLApr 2, 2021

IITK@LCP at SemEval 2021 Task 1: Classification for Lexical Complexity Regression Task

Neil Rajiv Shirude, Sagnik Mukherjee, Tushar Shandhilya, Ananta Mukherjee, Ashutosh Modi

arXiv:2104.01046v11.04 citationsHas Code

Originality Synthesis-oriented

AI Analysis

This work addresses lexical complexity prediction for natural language processing applications, but it is incremental as it adapts existing models to a specific competition task.

The paper tackled the lexical complexity prediction task by treating regression as an aggregation of classification and regression models, achieving MAE scores of 0.0654 for Sub-Task 1 and 0.0811 for Sub-Task 2.

This paper describes our contribution to SemEval 2021 Task 1: Lexical Complexity Prediction. In our approach, we leverage the ELECTRA model and attempt to mirror the data annotation scheme. Although the task is a regression task, we show that we can treat it as an aggregation of several classification and regression models. This somewhat counter-intuitive approach achieved an MAE score of 0.0654 for Sub-Task 1 and MAE of 0.0811 on Sub-Task 2. Additionally, we used the concept of weak supervision signals from Gloss-BERT in our work, and it significantly improved the MAE score in Sub-Task 1.

View on arXiv PDF Code

Similar