CLJul 28, 2023

Multilingual Lexical Simplification via Paraphrase Generation

arXiv:2307.15286v14 citationsh-index: 17
Originality Incremental advance
AI Analysis

This addresses the problem of simplifying text for non-native speakers or those with language impairments, though it is incremental as it builds on existing paraphrase and translation techniques.

The paper tackles lexical simplification by proposing a multilingual method via paraphrase generation, which outperforms BERT-based and zero-shot GPT3-based methods on English, Spanish, and Portuguese with significant gains.

Lexical simplification (LS) methods based on pretrained language models have made remarkable progress, generating potential substitutes for a complex word through analysis of its contextual surroundings. However, these methods require separate pretrained models for different languages and disregard the preservation of sentence meaning. In this paper, we propose a novel multilingual LS method via paraphrase generation, as paraphrases provide diversity in word selection while preserving the sentence's meaning. We regard paraphrasing as a zero-shot translation task within multilingual neural machine translation that supports hundreds of languages. After feeding the input sentence into the encoder of paraphrase modeling, we generate the substitutes based on a novel decoding strategy that concentrates solely on the lexical variations of the complex word. Experimental results demonstrate that our approach surpasses BERT-based methods and zero-shot GPT3-based method significantly on English, Spanish, and Portuguese.

Code Implementations1 repo
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes