Episodic Memory Verbalization using Hierarchical Representations of Life-Long Robot Experience
This addresses the challenge of summarizing and answering questions about a robot's past for human-robot interaction, though it is incremental as it builds on existing hierarchical representations and large language models.
The authors tackled the problem of verbalizing a robot's life-long experiences for improved human-robot interaction by applying large pretrained models with zero or few examples, resulting in a scalable method that keeps computational costs low even with months of data.
Verbalization of robot experience, i.e., summarization of and question answering about a robot's past, is a crucial ability for improving human-robot interaction. Previous works applied rule-based systems or fine-tuned deep models to verbalize short (several-minute-long) streams of episodic data, limiting generalization and transferability. In our work, we apply large pretrained models to tackle this task with zero or few examples, and specifically focus on verbalizing life-long experiences. For this, we derive a tree-like data structure from episodic memory (EM), with lower levels representing raw perception and proprioception data, and higher levels abstracting events to natural language concepts. Given such a hierarchical representation built from the experience stream, we apply a large language model as an agent to interactively search the EM given a user's query, dynamically expanding (initially collapsed) tree nodes to find the relevant information. The approach keeps computational costs low even when scaling to months of robot experience data. We evaluate our method on simulated household robot data, human egocentric videos, and real-world robot recordings, demonstrating its flexibility and scalability.