ProLex: A Benchmark for Language Proficiency-oriented Lexical Substitution
This work addresses a gap in lexical substitution for language learners seeking to enhance their writing proficiency, though it is incremental as it builds upon existing tasks.
The authors tackled the limitation of lexical substitution tasks by introducing a new task focused on generating substitutes that improve language proficiency, and they developed ProLex, a benchmark for this task, along with models that outperform ChatGPT by 3.2% in F-score and match GPT-4's performance.
Lexical Substitution discovers appropriate substitutes for a given target word in a context sentence. However, the task fails to consider substitutes that are of equal or higher proficiency than the target, an aspect that could be beneficial for language learners looking to improve their writing. To bridge this gap, we propose a new task, language proficiency-oriented lexical substitution. We also introduce ProLex, a novel benchmark designed to assess systems' ability to generate not only appropriate substitutes but also substitutes that demonstrate better language proficiency. Besides the benchmark, we propose models that can automatically perform the new task. We show that our best model, a Llama2-13B model fine-tuned with task-specific synthetic data, outperforms ChatGPT by an average of 3.2% in F-score and achieves comparable results with GPT-4 on ProLex.