LG AIMar 26, 2025

Improving User Behavior Prediction: Leveraging Annotator Metadata in Supervised Machine Learning Models

Lynnette Hui Xian Ng, Kokil Jaidka, Kaiyuan Tay, Hansin Ahuja, Niyati Chhaya

arXiv:2503.21000v24.12 citationsh-index: 25Proc. ACM Hum. Comput. Interact.

Originality Incremental advance

AI Analysis

This work addresses the challenge of improving model accuracy in user behavior prediction for applications relying on crowdsourced annotations, though it is incremental as it builds on existing ensemble methods.

The paper tackles the problem of poor user behavior prediction from conversational text due to low-quality crowdsourced labels by introducing MSWEEM, which integrates annotator metadata like fatigue and speeding, resulting in performance improvements of 14% on held-out data and 12% on an alternative dataset.

Supervised machine-learning models often underperform in predicting user behaviors from conversational text, hindered by poor crowdsourced label quality and low NLP task accuracy. We introduce the Metadata-Sensitive Weighted-Encoding Ensemble Model (MSWEEM), which integrates annotator meta-features like fatigue and speeding. First, our results show MSWEEM outperforms standard ensembles by 14% on held-out data and 12% on an alternative dataset. Second, we find that incorporating signals of annotator behavior, such as speed and fatigue, significantly boosts model performance. Third, we find that annotators with higher qualifications, such as Master's, deliver more consistent and faster annotations. Given the increasing uncertainty over annotation quality, our experiments show that understanding annotator patterns is crucial for enhancing model accuracy in user behavior prediction.

View on arXiv PDF

Similar