AI CL DBSep 22, 2025

Memory-QA: Answering Recall Questions Based on Multimodal Memories

Hongda Jiang, Xinyuan Zhang, Siddhant Garg, Rishab Arora, Shiun-Zu Kuo, Jiayang Xu, Ankur Bansal, Christopher Brossman, Yue Liu, Aaron Colak, Ahmed Aly, Anuj Kumar

Amazon

arXiv:2509.18436v212.45 citationsh-index: 11EMNLP

Originality Incremental advance

AI Analysis

This addresses the challenge of real-world multimodal memory recall for applications like personal assistants or surveillance, though it appears incremental as it builds on existing QA and retrieval techniques.

The paper tackles the problem of answering recall questions about visual content from stored multimodal memories, proposing a pipeline called Pensieve that achieves up to 14% improvement in QA accuracy over state-of-the-art methods.

We introduce Memory-QA, a novel real-world task that involves answering recall questions about visual content from previously stored multimodal memories. This task poses unique challenges, including the creation of task-oriented memories, the effective utilization of temporal and location information within memories, and the ability to draw upon multiple memories to answer a recall question. To address these challenges, we propose a comprehensive pipeline, Pensieve, integrating memory-specific augmentation, time- and location-aware multi-signal retrieval, and multi-memory QA fine-tuning. We created a multimodal benchmark to illustrate various real challenges in this task, and show the superior performance of Pensieve over state-of-the-art solutions (up to 14% on QA accuracy).

View on arXiv PDF

Similar