Chain-of-Thought Embeddings for Stance Detection on Social Media
This addresses stance detection for social media analysis, but it is incremental as it builds on existing Chain-of-Thought and RoBERTa methods.
The paper tackles the challenge of stance detection on social media, where implicit slang and colloquial language hinder Large Language Models, by introducing COT Embeddings that integrate Chain-of-Thought reasoning into a RoBERTa-based pipeline, achieving state-of-the-art performance on multiple datasets.
Stance detection on social media is challenging for Large Language Models (LLMs), as emerging slang and colloquial language in online conversations often contain deeply implicit stance labels. Chain-of-Thought (COT) prompting has recently been shown to improve performance on stance detection tasks -- alleviating some of these issues. However, COT prompting still struggles with implicit stance identification. This challenge arises because many samples are initially challenging to comprehend before a model becomes familiar with the slang and evolving knowledge related to different topics, all of which need to be acquired through the training data. In this study, we address this problem by introducing COT Embeddings which improve COT performance on stance detection tasks by embedding COT reasonings and integrating them into a traditional RoBERTa-based stance detection pipeline. Our analysis demonstrates that 1) text encoders can leverage COT reasonings with minor errors or hallucinations that would otherwise distort the COT output label. 2) Text encoders can overlook misleading COT reasoning when a sample's prediction heavily depends on domain-specific patterns. Our model achieves SOTA performance on multiple stance detection datasets collected from social media.