CLApr 11, 2022

HFL at SemEval-2022 Task 8: A Linguistics-inspired Regression Model with Data Augmentation for Multilingual News Similarity

Zihang Xu, Ziqing Yang, Yiming Cui, Zhigang Chen

arXiv:2204.04844v131.8628 citationsh-index: 23Has Code

Originality Incremental advance

AI Analysis

This work addresses the challenge of accurately assessing similarity in multilingual news articles, which is important for applications like information retrieval and cross-lingual content analysis, but it appears incremental as it builds on existing techniques with task-specific adaptations.

The paper tackled the problem of measuring multilingual news article similarity by proposing a linguistics-inspired regression model with data augmentation, achieving first place on the SemEval-2022 Task 8 leaderboard with a Pearson's Correlation Coefficient of 0.818 on the official evaluation set.

This paper describes our system designed for SemEval-2022 Task 8: Multilingual News Article Similarity. We proposed a linguistics-inspired model trained with a few task-specific strategies. The main techniques of our system are: 1) data augmentation, 2) multi-label loss, 3) adapted R-Drop, 4) samples reconstruction with the head-tail combination. We also present a brief analysis of some negative methods like two-tower architecture. Our system ranked 1st on the leaderboard while achieving a Pearson's Correlation Coefficient of 0.818 on the official evaluation set.

View on arXiv PDF Code

Similar