HC AISep 12, 2024

OmniQuery: Contextually Augmenting Captured Multimodal Memory to Enable Personal Question Answering

Jiahao Nick Li, Zhuohao Jerry Zhang, Jiaju Ma

arXiv:2409.08250v213.913 citationsh-index: 4

Originality Highly original

AI Analysis

This addresses the challenge for users who capture multimodal memories but struggle with complex queries involving sequential events, representing a novel method for a known bottleneck rather than a foundational advancement.

The paper tackled the problem of answering complex personal memory-related questions that require interpreting interconnected memories, and introduced OmniQuery, a system that achieved 71.5% accuracy in human evaluations, outperforming a conventional RAG system by winning or tying for 74.5% of the time.

People often capture memories through photos, screenshots, and videos. While existing AI-based tools enable querying this data using natural language, they only support retrieving individual pieces of information like certain objects in photos, and struggle with answering more complex queries that involve interpreting interconnected memories like sequential events. We conducted a one-month diary study to collect realistic user queries and generated a taxonomy of necessary contextual information for integrating with captured memories. We then introduce OmniQuery, a novel system that is able to answer complex personal memory-related questions that require extracting and inferring contextual information. OmniQuery augments individual captured memories through integrating scattered contextual information from multiple interconnected memories. Given a question, OmniQuery retrieves relevant augmented memories and uses a large language model (LLM) to generate answers with references. In human evaluations, we show the effectiveness of OmniQuery with an accuracy of 71.5%, outperforming a conventional RAG system by winning or tying for 74.5% of the time.

View on arXiv PDF

Similar