CL LGOct 14, 2024

A Multi-Task Text Classification Pipeline with Natural Language Explanations: A User-Centric Evaluation in Sentiment Analysis and Offensive Language Identification in Greek Tweets

Nikolaos Mylonas, Nikolaos Stylianou, Theodora Tsikrika, Stefanos Vrochidis, Ioannis Kompatsiaris

arXiv:2410.10290v11.02 citationsh-index: 33

Originality Incremental advance

AI Analysis

This work addresses the need for more understandable AI explanations for non-expert users in specific text classification domains, though it is incremental as it builds on existing interpretability concepts.

The paper tackled the problem of making AI interpretability more accessible by developing a pipeline for text classification that provides predictions and natural language explanations, tested on sentiment analysis and offensive language identification in Greek tweets with promising user evaluation results.

Interpretability is a topic that has been in the spotlight for the past few years. Most existing interpretability techniques produce interpretations in the form of rules or feature importance. These interpretations, while informative, may be harder to understand for non-expert users and therefore, cannot always be considered as adequate explanations. To that end, explanations in natural language are often preferred, as they are easier to comprehend and also more presentable to end-users. This work introduces an early concept for a novel pipeline that can be used in text classification tasks, offering predictions and explanations in natural language. It comprises of two models: a classifier for labelling the text and an explanation generator which provides the explanation. The proposed pipeline can be adopted by any text classification task, given that ground truth rationales are available to train the explanation generator. Our experiments are centred around the tasks of sentiment analysis and offensive language identification in Greek tweets, using a Greek Large Language Model (LLM) to obtain the necessary explanations that can act as rationales. The experimental evaluation was performed through a user study based on three different metrics and achieved promising results for both datasets.

View on arXiv PDF

Similar