A Survey on Retrieval And Structuring Augmented Generation with Large Language Models
It addresses critical limitations in deploying LLMs for real-world applications, but as a survey, it is incremental in synthesizing existing work rather than introducing new methods.
This survey tackles the challenges of hallucination, outdated knowledge, and limited domain expertise in large language models by examining retrieval and structuring augmented generation methods, providing a comprehensive overview of techniques and future directions for researchers and practitioners.
Large Language Models (LLMs) have revolutionized natural language processing with their remarkable capabilities in text generation and reasoning. However, these models face critical challenges when deployed in real-world applications, including hallucination generation, outdated knowledge, and limited domain expertise. Retrieval And Structuring (RAS) Augmented Generation addresses these limitations by integrating dynamic information retrieval with structured knowledge representations. This survey (1) examines retrieval mechanisms including sparse, dense, and hybrid approaches for accessing external knowledge; (2) explore text structuring techniques such as taxonomy construction, hierarchical classification, and information extraction that transform unstructured text into organized representations; and (3) investigate how these structured representations integrate with LLMs through prompt-based methods, reasoning frameworks, and knowledge embedding techniques. It also identifies technical challenges in retrieval efficiency, structure quality, and knowledge integration, while highlighting research opportunities in multimodal retrieval, cross-lingual structures, and interactive systems. This comprehensive overview provides researchers and practitioners with insights into RAS methods, applications, and future directions.