MoK-RAG: Mixture of Knowledge Paths Enhanced Retrieval-Augmented Generation for Embodied AI Environments
This work addresses the cognitive-algorithmic discrepancy in RAG systems for Embodied AI environments, representing an incremental improvement by extending multi-source retrieval to 3D scene generation.
The paper tackles the problem of single-source knowledge retrieval in Retrieval-Augmented Generation (RAG) systems by introducing MoK-RAG, a multi-source framework that partitions knowledge into specialized paths, and applies it to 3D simulated environments with MoK-RAG3D, resulting in enhanced scene generation for Embodied AI agents as demonstrated by automated and human evaluations.
While human cognition inherently retrieves information from diverse and specialized knowledge sources during decision-making processes, current Retrieval-Augmented Generation (RAG) systems typically operate through single-source knowledge retrieval, leading to a cognitive-algorithmic discrepancy. To bridge this gap, we introduce MoK-RAG, a novel multi-source RAG framework that implements a mixture of knowledge paths enhanced retrieval mechanism through functional partitioning of a large language model (LLM) corpus into distinct sections, enabling retrieval from multiple specialized knowledge paths. Applied to the generation of 3D simulated environments, our proposed MoK-RAG3D enhances this paradigm by partitioning 3D assets into distinct sections and organizing them based on a hierarchical knowledge tree structure. Different from previous methods that only use manual evaluation, we pioneered the introduction of automated evaluation methods for 3D scenes. Both automatic and human evaluations in our experiments demonstrate that MoK-RAG3D can assist Embodied AI agents in generating diverse scenes.