CV AI MMMar 28, 2022

3MASSIV: Multilingual, Multimodal and Multi-Aspect dataset of Social Media Short Videos

Vikram Gupta, Trisha Mittal, Puneet Mathur, Vaibhav Mishra, Mayank Maheshwari, Aniket Bera, Debdoot Mukherjee, Dinesh Manocha

arXiv:2203.14456v19.414 citationsh-index: 102

Originality Synthesis-oriented

AI Analysis

This provides a resource for researchers in AI and social media analysis to study multimodal and multilingual semantic understanding, though it is incremental as it focuses on dataset creation rather than novel methods.

The authors tackled the lack of diverse datasets for multimodal and multilingual analysis of social media short videos by introducing 3MASSIV, a dataset of 50k annotated and 100k unlabeled videos in 11 languages, capturing trends like pranks and comedy, and demonstrated its utility through strong baselines and analysis.

We present 3MASSIV, a multilingual, multimodal and multi-aspect, expertly-annotated dataset of diverse short videos extracted from short-video social media platform - Moj. 3MASSIV comprises of 50k short videos (20 seconds average duration) and 100K unlabeled videos in 11 different languages and captures popular short video trends like pranks, fails, romance, comedy expressed via unique audio-visual formats like self-shot videos, reaction videos, lip-synching, self-sung songs, etc. 3MASSIV presents an opportunity for multimodal and multilingual semantic understanding on these unique videos by annotating them for concepts, affective states, media types, and audio language. We present a thorough analysis of 3MASSIV and highlight the variety and unique aspects of our dataset compared to other contemporary popular datasets with strong baselines. We also show how the social media content in 3MASSIV is dynamic and temporal in nature, which can be used for semantic understanding tasks and cross-lingual analysis.

View on arXiv PDF

Similar