CLDec 14, 2024

Are Language Models Agnostic to Linguistically Grounded Perturbations? A Case Study of Indic Languages

Poulami Ghosh, Raj Dabre, Pushpak Bhattacharyya

arXiv:2412.10805v18.211 citationsh-index: 26NAACL

Originality Incremental advance

AI Analysis

This addresses the robustness of language models for NLP practitioners, but it is incremental as it extends existing perturbation studies to linguistically grounded attacks.

The paper tackled the problem of whether pre-trained language models are agnostic to linguistically grounded perturbations, finding that they are susceptible but slightly less so compared to non-linguistic attacks, with a case study on Indic languages.

Pre-trained language models (PLMs) are known to be susceptible to perturbations to the input text, but existing works do not explicitly focus on linguistically grounded attacks, which are subtle and more prevalent in nature. In this paper, we study whether PLMs are agnostic to linguistically grounded attacks or not. To this end, we offer the first study addressing this, investigating different Indic languages and various downstream tasks. Our findings reveal that although PLMs are susceptible to linguistic perturbations, when compared to non-linguistic attacks, PLMs exhibit a slightly lower susceptibility to linguistic attacks. This highlights that even constrained attacks are effective. Moreover, we investigate the implications of these outcomes across a range of languages, encompassing diverse language families and different scripts.

View on arXiv PDF

Similar