AICLDBSep 22, 2025

Memory-QA: Answering Recall Questions Based on Multimodal Memories

Amazon
arXiv:2509.18436v25 citationsh-index: 11EMNLP
AI Analysis

This addresses the challenge of real-world multimodal memory recall for applications like personal assistants or surveillance, though it appears incremental as it builds on existing QA and retrieval techniques.

The paper tackles the problem of answering recall questions about visual content from stored multimodal memories, proposing a pipeline called Pensieve that achieves up to 14% improvement in QA accuracy over state-of-the-art methods.

We introduce Memory-QA, a novel real-world task that involves answering recall questions about visual content from previously stored multimodal memories. This task poses unique challenges, including the creation of task-oriented memories, the effective utilization of temporal and location information within memories, and the ability to draw upon multiple memories to answer a recall question. To address these challenges, we propose a comprehensive pipeline, Pensieve, integrating memory-specific augmentation, time- and location-aware multi-signal retrieval, and multi-memory QA fine-tuning. We created a multimodal benchmark to illustrate various real challenges in this task, and show the superior performance of Pensieve over state-of-the-art solutions (up to 14% on QA accuracy).

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes