IITK at SemEval-2024 Task 1: Contrastive Learning and Autoencoders for Semantic Textual Relatedness in Multilingual Texts
This work addresses the challenge of multilingual semantic relatedness for NLP applications, but it is incremental as it builds on existing methods like BERT and autoencoders for a specific competition task.
The paper tackled the problem of automatically detecting semantic textual relatedness between sentence pairs in 14 languages, including low-resource ones, by developing a system for SemEval-2024 Task 1, achieving results through BERT-based contrastive learning for supervised tasks and autoencoders for unsupervised tasks, with specific metrics like refined word embeddings from a bigram corpus.
This paper describes our system developed for the SemEval-2024 Task 1: Semantic Textual Relatedness. The challenge is focused on automatically detecting the degree of relatedness between pairs of sentences for 14 languages including both high and low-resource Asian and African languages. Our team participated in two subtasks consisting of Track A: supervised and Track B: unsupervised. This paper focuses on a BERT-based contrastive learning and similarity metric based approach primarily for the supervised track while exploring autoencoders for the unsupervised track. It also aims on the creation of a bigram relatedness corpus using negative sampling strategy, thereby producing refined word embeddings.