CLOct 23, 2022

EUREKA: EUphemism Recognition Enhanced through Knn-based methods and Augmentation

Sedrick Scott Keh, Rohit K. Bharadwaj, Emmy Liu, Simone Tedeschi, Varun Gangal, Roberto Navigli

CMU

arXiv:2210.12846v124.0292 citationsh-index: 65Has Code

Originality Synthesis-oriented

AI Analysis

This work addresses euphemism detection, a domain-specific NLP task, with incremental improvements through data augmentation and ensemble methods.

The paper tackled automatic euphemism detection by introducing EUREKA, an ensemble-based approach that corrected mislabelled data, curated an expanded corpus, and used kNN-based methods, achieving state-of-the-art results with a macro F1 score of 0.881 on a public leaderboard.

We introduce EUREKA, an ensemble-based approach for performing automatic euphemism detection. We (1) identify and correct potentially mislabelled rows in the dataset, (2) curate an expanded corpus called EuphAug, (3) leverage model representations of Potentially Euphemistic Terms (PETs), and (4) explore using representations of semantically close sentences to aid in classification. Using our augmented dataset and kNN-based methods, EUREKA was able to achieve state-of-the-art results on the public leaderboard of the Euphemism Detection Shared Task, ranking first with a macro F1 score of 0.881. Our code is available at https://github.com/sedrickkeh/EUREKA.

View on arXiv PDF Code

Similar