CLJan 4, 2023

UniHD at TSAR-2022 Shared Task: Is Compute All We Need for Lexical Simplification?

arXiv:2301.01764v2294 citationsh-index: 39Has Code
AI Analysis

This work addresses the problem of simplifying complex lexical simplification pipelines for researchers and practitioners, showing that a compute-heavy approach can yield strong results with minimal training data, though it is incremental in leveraging existing large language models.

The paper tackled lexical simplification by proposing a simple pipeline using prompted GPT-3 responses, which outperformed previous state-of-the-art models by a wide margin in low-data settings, achieving top results in English, Spanish, and Portuguese tracks of the TSAR-2022 shared task.

Previous state-of-the-art models for lexical simplification consist of complex pipelines with several components, each of which requires deep technical knowledge and fine-tuned interaction to achieve its full potential. As an alternative, we describe a frustratingly simple pipeline based on prompted GPT-3 responses, beating competing approaches by a wide margin in settings with few training instances. Our best-performing submission to the English language track of the TSAR-2022 shared task consists of an ``ensemble'' of six different prompt templates with varying context levels. As a late-breaking result, we further detail a language transfer technique that allows simplification in languages other than English. Applied to the Spanish and Portuguese subset, we achieve state-of-the-art results with only minor modification to the original prompts. Aside from detailing the implementation and setup, we spend the remainder of this work discussing the particularities of prompting and implications for future work. Code for the experiments is available online at https://github.com/dennlinger/TSAR-2022-Shared-Task

Code Implementations1 repo
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes