CL SIMay 18, 2024

BrainStorm @ iREL at #SMM4H 2024: Leveraging Translation and Topical Embeddings for Annotation Detection in Tweets

Manav Chaudhary, Harshit Gupta, Vasudeva Varma

arXiv:2405.11192v213.826 citationsh-index: 3SMM4H

Originality Incremental advance

AI Analysis

This addresses the reliability issue of LLMs in annotation tasks for public health monitoring, though it appears incremental as a shared task submission.

The paper tackled the problem of distinguishing LLM-generated annotations from human expert annotations in COVID-19 symptom detection tweets in Latin American Spanish, achieving competitive performance in the SMM4H 2024 shared task by leveraging translation and topical embeddings.

The proliferation of LLMs in various NLP tasks has sparked debates regarding their reliability, particularly in annotation tasks where biases and hallucinations may arise. In this shared task, we address the challenge of distinguishing annotations made by LLMs from those made by human domain experts in the context of COVID-19 symptom detection from tweets in Latin American Spanish. This paper presents BrainStorm @ iRELs approach to the SMM4H 2024 Shared Task, leveraging the inherent topical information in tweets, we propose a novel approach to identify and classify annotations, aiming to enhance the trustworthiness of annotated data.

View on arXiv PDF

Similar