CL AI LGApr 6, 2024

IITK at SemEval-2024 Task 1: Contrastive Learning and Autoencoders for Semantic Textual Relatedness in Multilingual Texts

Udvas Basak, Rajarshi Dutta, Shivam Pandey, Ashutosh Modi

arXiv:2404.04513v114.126 citationsh-index: 24Has CodeSemEval

Originality Synthesis-oriented

AI Analysis

This work addresses the challenge of multilingual semantic relatedness for NLP applications, but it is incremental as it builds on existing methods like BERT and autoencoders for a specific competition task.

The paper tackled the problem of automatically detecting semantic textual relatedness between sentence pairs in 14 languages, including low-resource ones, by developing a system for SemEval-2024 Task 1, achieving results through BERT-based contrastive learning for supervised tasks and autoencoders for unsupervised tasks, with specific metrics like refined word embeddings from a bigram corpus.

This paper describes our system developed for the SemEval-2024 Task 1: Semantic Textual Relatedness. The challenge is focused on automatically detecting the degree of relatedness between pairs of sentences for 14 languages including both high and low-resource Asian and African languages. Our team participated in two subtasks consisting of Track A: supervised and Track B: unsupervised. This paper focuses on a BERT-based contrastive learning and similarity metric based approach primarily for the supervised track while exploring autoencoders for the unsupervised track. It also aims on the creation of a bigram relatedness corpus using negative sampling strategy, thereby producing refined word embeddings.

View on arXiv PDF Code

Similar