UPLME: Uncertainty-Aware Probabilistic Language Modelling for Robust Empathy Regression
This work addresses label noise in empathy regression, a domain-specific issue in natural language processing, with incremental improvements over existing methods.
The paper tackles the problem of noisy self-reported empathy scores in supervised learning for empathy regression by proposing UPLME, an uncertainty-aware probabilistic language modeling framework, which achieves state-of-the-art performance with improvements in Pearson Correlation Coefficient from 0.558 to 0.580 and 0.629 to 0.634 on two benchmarks.
Supervised learning for empathy regression is challenged by noisy self-reported empathy scores. While many algorithms have been proposed for learning with noisy labels in textual classification problems, the regression counterpart is relatively under-explored. We propose UPLME, an uncertainty-aware probabilistic language modelling framework to capture label noise in the regression setting of empathy detection. UPLME includes a probabilistic language model that predicts both empathy score and heteroscedastic uncertainty and is trained using Bayesian concepts with variational model ensembling. We further introduce two novel loss components: one penalises degenerate Uncertainty Quantification (UQ), and another enforces the similarity between the input pairs on which we predict empathy. UPLME provides state-of-the-art performance (Pearson Correlation Coefficient: $0.558\rightarrow0.580$ and $0.629\rightarrow0.634$) in terms of the performance reported in the literature in two public benchmarks, having label noise. Through synthetic label noise injection, we show that UPLME is effective in separating noisy and clean samples based on the predicted uncertainty. UPLME further outperform (Calibration error: $0.571\rightarrow0.376$) a recent variational model ensembling-based UQ method designed for regression problems.