Comparing Explanation Faithfulness between Multilingual and Monolingual Fine-tuned Language Models
This addresses the need for reliable explanations in multilingual NLP applications, though it is incremental as it extends existing monolingual faithfulness research to multilingual contexts.
The study investigated the faithfulness of feature attribution explanations in multilingual versus monolingual fine-tuned language models, finding that larger multilingual models produce less faithful explanations compared to monolingual counterparts, with tokenizer differences identified as a potential driver.
In many real natural language processing application scenarios, practitioners not only aim to maximize predictive performance but also seek faithful explanations for the model predictions. Rationales and importance distribution given by feature attribution methods (FAs) provide insights into how different parts of the input contribute to a prediction. Previous studies have explored how different factors affect faithfulness, mainly in the context of monolingual English models. On the other hand, the differences in FA faithfulness between multilingual and monolingual models have yet to be explored. Our extensive experiments, covering five languages and five popular FAs, show that FA faithfulness varies between multilingual and monolingual models. We find that the larger the multilingual model, the less faithful the FAs are compared to its counterpart monolingual models.Our further analysis shows that the faithfulness disparity is potentially driven by the differences between model tokenizers. Our code is available: https://github.com/casszhao/multilingual-faith.