OCHADAI at SemEval-2022 Task 2: Adversarial Training for Multilingual Idiomaticity Detection
This work addresses the challenge of limited annotated data for multilingual idiomaticity detection, but it is incremental as it applies existing adversarial training methods to a specific task.
The paper tackled the problem of detecting idiomatic expressions in sentences across multiple languages by using adversarial training with pre-trained multilingual transformers, achieving competitive results with 6th place in zero-shot and 15th place in one-shot settings in a SemEval task.
We propose a multilingual adversarial training model for determining whether a sentence contains an idiomatic expression. Given that a key challenge with this task is the limited size of annotated data, our model relies on pre-trained contextual representations from different multi-lingual state-of-the-art transformer-based language models (i.e., multilingual BERT and XLM-RoBERTa), and on adversarial training, a training method for further enhancing model generalization and robustness. Without relying on any human-crafted features, knowledge bases, or additional datasets other than the target datasets, our model achieved competitive results and ranked 6th place in SubTask A (zero-shot) setting and 15th place in SubTask A (one-shot) setting.