Exploring the Potential of the Large Language Models (LLMs) in Identifying Misleading News Headlines
It addresses the problem of misinformation detection for the public and researchers, but is incremental as it applies existing models to a small, specific dataset.
This study tested the ability of three large language models (ChatGPT-3.5, ChatGPT-4, and Gemini) to identify misleading news headlines using a dataset of 60 articles, finding that ChatGPT-4 achieved superior accuracy, particularly on headlines with unanimous human agreement.
In the digital age, the prevalence of misleading news headlines poses a significant challenge to information integrity, necessitating robust detection mechanisms. This study explores the efficacy of Large Language Models (LLMs) in identifying misleading versus non-misleading news headlines. Utilizing a dataset of 60 articles, sourced from both reputable and questionable outlets across health, science & tech, and business domains, we employ three LLMs- ChatGPT-3.5, ChatGPT-4, and Gemini-for classification. Our analysis reveals significant variance in model performance, with ChatGPT-4 demonstrating superior accuracy, especially in cases with unanimous annotator agreement on misleading headlines. The study emphasizes the importance of human-centered evaluation in developing LLMs that can navigate the complexities of misinformation detection, aligning technical proficiency with nuanced human judgment. Our findings contribute to the discourse on AI ethics, emphasizing the need for models that are not only technically advanced but also ethically aligned and sensitive to the subtleties of human interpretation.