A Grounded Memory System For Smart Personal Assistants
This addresses the need for reliable memory in applications like cognitive assistants and robotics, but it appears incremental as it integrates existing methods without claiming major breakthroughs.
The paper tackles the problem of creating a robust memory system for agentic AI applications by proposing a three-component system combining Vision Language Models, Large Language Models, and a knowledge graph with vector embeddings, illustrated through a real-world example.
A wide variety of agentic AI applications - ranging from cognitive assistants for dementia patients to robotics - demand a robust memory system grounded in reality. In this paper, we propose such a memory system consisting of three components. First, we combine Vision Language Models for image captioning and entity disambiguation with Large Language Models for consistent information extraction during perception. Second, the extracted information is represented in a memory consisting of a knowledge graph enhanced by vector embeddings to efficiently manage relational information. Third, we combine semantic search and graph query generation for question answering via Retrieval Augmented Generation. We illustrate the system's working and potential using a real-world example.