Advanced ingestion process powered by LLM parsing for RAG system
This addresses a domain-specific bottleneck for RAG systems handling diverse document types, representing an incremental improvement.
The paper tackled the problem of processing multimodal documents with varying structural complexity in Retrieval Augmented Generation (RAG) systems by introducing a multi-strategy parsing approach using LLM-powered OCR, resulting in improved answer relevancy and information faithfulness in experimental evaluations.
Retrieval Augmented Generation (RAG) systems struggle with processing multimodal documents of varying structural complexity. This paper introduces a novel multi-strategy parsing approach using LLM-powered OCR to extract content from diverse document types, including presentations and high text density files both scanned or not. The methodology employs a node-based extraction technique that creates relationships between different information types and generates context-aware metadata. By implementing a Multimodal Assembler Agent and a flexible embedding strategy, the system enhances document comprehension and retrieval capabilities. Experimental evaluations across multiple knowledge bases demonstrate the approach's effectiveness, showing improvements in answer relevancy and information faithfulness.