Does Using Counterfactual Help LLMs Explain Textual Importance in Classification?
This addresses the need for explainable AI in LLM classification tasks, particularly under practical constraints like black-box models and expensive API calls, though it appears incremental.
The researchers tackled the problem of explaining LLM classification decisions by studying whether incorporating counterfactuals helps identify important words, finding that using counterfactuals can be helpful based on their decision changing rate framework.
Large language models (LLMs) are becoming useful in many domains due to their impressive abilities that arise from large training datasets and large model sizes. More recently, they have been shown to be very effective in textual classification tasks, motivating the need to explain the LLMs' decisions. Motivated by practical constrains where LLMs are black-boxed and LLM calls are expensive, we study how incorporating counterfactuals into LLM reasoning can affect the LLM's ability to identify the top words that have contributed to its classification decision. To this end, we introduce a framework called the decision changing rate that helps us quantify the importance of the top words in classification. Our experimental results show that using counterfactuals can be helpful.