CL AI CY LGJan 26, 2025

Evaluating the Effectiveness of XAI Techniques for Encoder-Based Language Models

Melkamu Abay Mersha, Mesay Gemeda Yigezu, Jugal Kalita

arXiv:2501.15374v112.016 citationsh-index: 10Knowledge-Based Systems

Originality Incremental advance

AI Analysis

This work addresses the problem of evaluating XAI techniques for researchers and practitioners in NLP, providing a systematic framework and comparative analysis, but it is incremental as it builds on existing methods without introducing new paradigms.

This study tackled the challenge of evaluating eXplainable AI (XAI) techniques for encoder-based language models by developing a general evaluation framework with four metrics and testing six techniques across five models and two datasets. The results showed that LIME consistently outperformed others in multiple metrics, achieving a Human-reasoning Agreement score of 0.9685 on DeBERTa-xlarge, while AMV excelled in Robustness and Consistency with scores as low as 0.0020 and up to 0.9999, respectively.

The black-box nature of large language models (LLMs) necessitates the development of eXplainable AI (XAI) techniques for transparency and trustworthiness. However, evaluating these techniques remains a challenge. This study presents a general evaluation framework using four key metrics: Human-reasoning Agreement (HA), Robustness, Consistency, and Contrastivity. We assess the effectiveness of six explainability techniques from five different XAI categories model simplification (LIME), perturbation-based methods (SHAP), gradient-based approaches (InputXGradient, Grad-CAM), Layer-wise Relevance Propagation (LRP), and attention mechanisms-based explainability methods (Attention Mechanism Visualization, AMV) across five encoder-based language models: TinyBERT, BERTbase, BERTlarge, XLM-R large, and DeBERTa-xlarge, using the IMDB Movie Reviews and Tweet Sentiment Extraction (TSE) datasets. Our findings show that the model simplification-based XAI method (LIME) consistently outperforms across multiple metrics and models, significantly excelling in HA with a score of 0.9685 on DeBERTa-xlarge, robustness, and consistency as the complexity of large language models increases. AMV demonstrates the best Robustness, with scores as low as 0.0020. It also excels in Consistency, achieving near-perfect scores of 0.9999 across all models. Regarding Contrastivity, LRP performs the best, particularly on more complex models, with scores up to 0.9371.

View on arXiv PDF

Similar