CL AIDec 2, 2024

Towards Cross-Lingual Audio Abuse Detection in Low-Resource Settings with Few-Shot Learning

Aditya Narayan Sankaran, Reza Farahbakhsh, Noel Crespi

arXiv:2412.01408v311.920 citationsh-index: 7Has CodeCOLING

Originality Synthesis-oriented

AI Analysis

This addresses the problem of detecting abusive content in audio for low-resource languages, which is incremental as it applies existing methods to a new domain.

The paper tackled cross-lingual audio abuse detection in low-resource settings by using pre-trained audio representations with few-shot learning, achieving classification in 10 languages with experiments on shot sizes from 50 to 200.

Online abusive content detection, particularly in low-resource settings and within the audio modality, remains underexplored. We investigate the potential of pre-trained audio representations for detecting abusive language in low-resource languages, in this case, in Indian languages using Few Shot Learning (FSL). Leveraging powerful representations from models such as Wav2Vec and Whisper, we explore cross-lingual abuse detection using the ADIMA dataset with FSL. Our approach integrates these representations within the Model-Agnostic Meta-Learning (MAML) framework to classify abusive language in 10 languages. We experiment with various shot sizes (50-200) evaluating the impact of limited data on performance. Additionally, a feature visualization study was conducted to better understand model behaviour. This study highlights the generalization ability of pre-trained models in low-resource scenarios and offers valuable insights into detecting abusive language in multilingual contexts.

View on arXiv PDF Code

Similar