LG AIJul 17, 2023

Analyzing the Impact of Adversarial Examples on Explainable Machine Learning

Prathyusha Devabhakthini, Sasmita Parida, Raj Mani Shukla, Suvendu Chandan Nayak, Tapadhir Das

arXiv:2307.08327v22.06 citationsh-index: 12

Originality Synthesis-oriented

AI Analysis

This addresses security concerns for users of explainable AI in domains like text classification, but it is incremental as it applies known adversarial methods to interpretability analysis.

The paper tackled the problem of adversarial attacks on text classification models by analyzing their impact on model interpretability, finding that adversarial perturbations degrade classification performance and alter explainability.

Adversarial attacks are a type of attack on machine learning models where an attacker deliberately modifies the inputs to cause the model to make incorrect predictions. Adversarial attacks can have serious consequences, particularly in applications such as autonomous vehicles, medical diagnosis, and security systems. Work on the vulnerability of deep learning models to adversarial attacks has shown that it is very easy to make samples that make a model predict things that it doesn't want to. In this work, we analyze the impact of model interpretability due to adversarial attacks on text classification problems. We develop an ML-based classification model for text data. Then, we introduce the adversarial perturbations on the text data to understand the classification performance after the attack. Subsequently, we analyze and interpret the model's explainability before and after the attack

View on arXiv PDF

Similar