SemEval 2023 Task 9: Multilingual Tweet Intimacy Analysis
This work provides a new dataset and benchmark for multilingual intimacy analysis, which is incremental as it extends existing research to more languages and social media contexts.
The authors tackled the problem of analyzing intimacy in multilingual tweets by creating MINT, a dataset of 13,372 tweets across 10 languages, and benchmarked popular multilingual pre-trained language models on it.
We propose MINT, a new Multilingual INTimacy analysis dataset covering 13,372 tweets in 10 languages including English, French, Spanish, Italian, Portuguese, Korean, Dutch, Chinese, Hindi, and Arabic. We benchmarked a list of popular multilingual pre-trained language models. The dataset is released along with the SemEval 2023 Task 9: Multilingual Tweet Intimacy Analysis (https://sites.google.com/umich.edu/semeval-2023-tweet-intimacy).