CLAISep 19, 2022

ALEXSIS-PT: A New Resource for Portuguese Lexical Simplification

arXiv:2209.09034v2582 citationsh-index: 45
AI Analysis

This provides a resource for improving lexical simplification systems to make texts more accessible in Brazilian Portuguese, but it is incremental as it extends an existing protocol to a new language.

The paper introduces ALEXSIS-PT, a new multi-candidate dataset for Brazilian Portuguese lexical simplification containing 9,605 candidate substitutions for 387 complex words, and finds that BERTimbau achieved the highest performance among evaluated models.

Lexical simplification (LS) is the task of automatically replacing complex words for easier ones making texts more accessible to various target populations (e.g. individuals with low literacy, individuals with learning disabilities, second language learners). To train and test models, LS systems usually require corpora that feature complex words in context along with their candidate substitutions. To continue improving the performance of LS systems we introduce ALEXSIS-PT, a novel multi-candidate dataset for Brazilian Portuguese LS containing 9,605 candidate substitutions for 387 complex words. ALEXSIS-PT has been compiled following the ALEXSIS protocol for Spanish opening exciting new avenues for cross-lingual models. ALEXSIS-PT is the first LS multi-candidate dataset that contains Brazilian newspaper articles. We evaluated four models for substitute generation on this dataset, namely mDistilBERT, mBERT, XLM-R, and BERTimbau. BERTimbau achieved the highest performance across all evaluation metrics.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes