CL LGAug 25, 2025

CausalSent: Interpretable Sentiment Classification with RieszNet

arXiv:2508.17576v2h-index: 1

Originality Incremental advance

AI Analysis

This work addresses interpretability in NLP for researchers and practitioners, but it is incremental as it extends existing methods with a new architecture and dataset.

The paper tackled the problem of black-box decisions in NLP models by developing CausalSent, a two-headed RieszNet-based architecture for interpretable sentiment classification, which reduced MAE of effect estimates by 2-3x compared to prior work and found that the word 'love' causes a +2.9% increase in positive sentiment probability in IMDB reviews.

Despite the overwhelming performance improvements offered by recent natural language processing (NLP) models, the decisions made by these models are largely a black box. Towards closing this gap, the field of causal NLP combines causal inference literature with modern NLP models to elucidate causal effects of text features. We replicate and extend Bansal et al's work on regularizing text classifiers to adhere to estimated effects, focusing instead on model interpretability. Specifically, we focus on developing a two-headed RieszNet-based neural network architecture which achieves better treatment effect estimation accuracy. Our framework, CausalSent, accurately predicts treatment effects in semi-synthetic IMDB movie reviews, reducing MAE of effect estimates by 2-3x compared to Bansal et al's MAE on synthetic Civil Comments data. With an ensemble of validated models, we perform an observational case study on the causal effect of the word "love" in IMDB movie reviews, finding that the presence of the word "love" causes a +2.9% increase in the probability of a positive sentiment.

View on arXiv PDF

Similar