CLMay 29, 2025

SNS-Bench-VL: Benchmarking Multimodal Large Language Models in Social Networking Services

arXiv:2505.23065v12 citationsh-index: 16Has Code
Originality Synthesis-oriented
AI Analysis

This addresses the need for better evaluation tools for multimodal AI in social media platforms, though it is incremental as it builds on existing benchmarking efforts.

The paper tackles the lack of multimodal benchmarks for evaluating Vision-Language Large Language Models in social networking services by introducing SNS-Bench-VL, a comprehensive benchmark with 4,001 question-answer pairs across 8 tasks, and evaluates over 25 models to highlight challenges in social context comprehension.

With the increasing integration of visual and textual content in Social Networking Services (SNS), evaluating the multimodal capabilities of Large Language Models (LLMs) is crucial for enhancing user experience, content understanding, and platform intelligence. Existing benchmarks primarily focus on text-centric tasks, lacking coverage of the multimodal contexts prevalent in modern SNS ecosystems. In this paper, we introduce SNS-Bench-VL, a comprehensive multimodal benchmark designed to assess the performance of Vision-Language LLMs in real-world social media scenarios. SNS-Bench-VL incorporates images and text across 8 multimodal tasks, including note comprehension, user engagement analysis, information retrieval, and personalized recommendation. It comprises 4,001 carefully curated multimodal question-answer pairs, covering single-choice, multiple-choice, and open-ended tasks. We evaluate over 25 state-of-the-art multimodal LLMs, analyzing their performance across tasks. Our findings highlight persistent challenges in multimodal social context comprehension. We hope SNS-Bench-VL will inspire future research towards robust, context-aware, and human-aligned multimodal intelligence for next-generation social networking services.

Code Implementations1 repo
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes