CLJul 8, 2025

DS@GT at CheckThat! 2025: Ensemble Methods for Detection of Scientific Discourse on Social Media

Ayush Parikh, Hoang Thanh Thanh Truong, Jeanette Schofield, Maximilian Heil

arXiv:2507.06205v12.72 citationsh-index: 1Has CodeCLEF

Originality Synthesis-oriented

AI Analysis

This work addresses the problem of identifying scientific content in social media posts for researchers and fact-checkers, but it is incremental as it builds on existing methods in a competition setting.

The DS@GT team tackled the CLEF 2025 CheckThat! Task 4a for detecting scientific discourse in tweets, achieving a macro-averaged F1 score of 0.8611, which improved upon the DeBERTaV3 baseline of 0.8375 and placed 7th in the competition.

In this paper, we, as the DS@GT team for CLEF 2025 CheckThat! Task 4a Scientific Web Discourse Detection, present the methods we explored for this task. For this multiclass classification task, we determined if a tweet contained a scientific claim, a reference to a scientific study or publication, and/or mentions of scientific entities, such as a university or a scientist. We present 3 modeling approaches for this task: transformer finetuning, few-shot prompting of LLMs, and a combined ensemble model whose design was informed by earlier experiments. Our team placed 7th in the competition, achieving a macro-averaged F1 score of 0.8611, an improvement over the DeBERTaV3 baseline of 0.8375. Our code is available on Github at https://github.com/dsgt-arc/checkthat-2025-swd/tree/main/subtask-4a.

View on arXiv PDF Code

Similar