CLAILGNEFeb 24, 2023

HULAT at SemEval-2023 Task 9: Data augmentation for pre-trained transformers applied to Multilingual Tweet Intimacy Analysis

arXiv:2302.12794v1224 citationsh-index: 24Has Code
Originality Synthesis-oriented
AI Analysis

This is an incremental contribution to NLP for social media analysis, focusing on a specific competition task.

The paper tackled multilingual tweet intimacy analysis by fine-tuning transformer models with data augmentation, achieving modest results with a 27th place ranking out of 45 systems and slight improvements from augmentation.

This paper describes our participation in SemEval-2023 Task 9, Intimacy Analysis of Multilingual Tweets. We fine-tune some of the most popular transformer models with the training dataset and synthetic data generated by different data augmentation techniques. During the development phase, our best results were obtained by using XLM-T. Data augmentation techniques provide a very slight improvement in the results. Our system ranked in the 27th position out of the 45 participating systems. Despite its modest results, our system shows promising results in languages such as Portuguese, English, and Dutch. All our code is available in the repository \url{https://github.com/isegura/hulat_intimacy}.

Code Implementations1 repo
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes