CLMay 15

SMMBench: A Benchmark for Source-Distributed Multimodal Agent Memory

arXiv:2605.1571089.7Has Code
Predicted impact top 33% in CL · last 90 daysOriginality Incremental advance
AI Analysis

For researchers building multimodal agents, this benchmark reveals a previously under-evaluated bottleneck in source-distributed memory composition.

SMMBench is a benchmark for evaluating multimodal agents' ability to retrieve, align, and compose evidence distributed across multiple independent sources. Experiments show current systems struggle, with no model exceeding 60% accuracy on the hardest tasks.

Existing benchmarks for multimodal memory reasoning largely evaluate systems within pre-assembled contexts, but under-evaluate whether agents can use evidence distributed across independently originated sources. We argue that source-distributed memory composition is an important and under-examined bottleneck in multimodal agent memory, especially when relevant evidence is fragmented across heterogeneous artifacts such as conversations, profiles, screenshots, tables, images, and documents. To address this gap, we introduce Source-distributed Multimodal Memory Benchmark(SMMBench), which measures whether agents can retrieve, align, and compose multimodal evidence scattered across multiple sources rather than reason within a single curated context. SMMBench evaluates four core capabilities: (1) cross-source multimodal reasoning; (2) conflict resolution; (3) preference reasoning; (4) memory-grounded action prediction. The benchmark contains 1877 samples grounded in 264 sources. Experiments on representative memory-style and retrieval-based baselines show that current systems still struggle on these capabilities, positioning source-distributed multimodal memory as an important and still under-evaluated challenge for multimodal agents. Our data are available at https://huggingface.co/datasets/HuacanChai/SMMBench.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes