Measuring Adversarial Datasets
This work addresses the need for better measurement of adversarial datasets in NLP to improve model robustness, but it is incremental as it focuses on surveying and comparing existing metrics without introducing new methods.
The paper tackled the problem of understanding how adversarial datasets differ from original data in NLP by systematically surveying metrics for difficulty, diversity, and disagreement, and comparing distributions between original and adversarial counterparts, providing insights into dataset challenge and alignment with assumptions.
In the era of widespread public use of AI systems across various domains, ensuring adversarial robustness has become increasingly vital to maintain safety and prevent undesirable errors. Researchers have curated various adversarial datasets (through perturbations) for capturing model deficiencies that cannot be revealed in standard benchmark datasets. However, little is known about how these adversarial examples differ from the original data points, and there is still no methodology to measure the intended and unintended consequences of those adversarial transformations. In this research, we conducted a systematic survey of existing quantifiable metrics that describe text instances in NLP tasks, among dimensions of difficulty, diversity, and disagreement. We selected several current adversarial effect datasets and compared the distributions between the original and their adversarial counterparts. The results provide valuable insights into what makes these datasets more challenging from a metrics perspective and whether they align with underlying assumptions.