CLOct 23, 2022

EUREKA: EUphemism Recognition Enhanced through Knn-based methods and Augmentation

CMU
arXiv:2210.12846v1292 citationsh-index: 65Has Code
Originality Synthesis-oriented
AI Analysis

This work addresses euphemism detection, a domain-specific NLP task, with incremental improvements through data augmentation and ensemble methods.

The paper tackled automatic euphemism detection by introducing EUREKA, an ensemble-based approach that corrected mislabelled data, curated an expanded corpus, and used kNN-based methods, achieving state-of-the-art results with a macro F1 score of 0.881 on a public leaderboard.

We introduce EUREKA, an ensemble-based approach for performing automatic euphemism detection. We (1) identify and correct potentially mislabelled rows in the dataset, (2) curate an expanded corpus called EuphAug, (3) leverage model representations of Potentially Euphemistic Terms (PETs), and (4) explore using representations of semantically close sentences to aid in classification. Using our augmented dataset and kNN-based methods, EUREKA was able to achieve state-of-the-art results on the public leaderboard of the Euphemism Detection Shared Task, ranking first with a macro F1 score of 0.881. Our code is available at https://github.com/sedrickkeh/EUREKA.

Code Implementations1 repo
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes