CLApr 9, 2025

Evaluating Retrieval Augmented Generative Models for Document Queries in Transportation Safety

Chad Melton, Alex Sorokine, Steve Peterson

arXiv:2504.07022v1

Originality Synthesis-oriented

AI Analysis

This addresses accuracy and reliability challenges for implementing generative models in high-stakes domains like hazardous materials transportation, though it is an incremental application of existing RAG methods to a new domain.

The study evaluated fine-tuned generative models, including ChatGPT, Vertex AI, and RAG-augmented LLaMA models, for retrieving regulatory information in hazardous materials transportation, finding that RAG-augmented LLaMA models significantly outperformed the others in accuracy and detail.

Applications of generative Large Language Models LLMs are rapidly expanding across various domains, promising significant improvements in workflow efficiency and information retrieval. However, their implementation in specialized, high-stakes domains such as hazardous materials transportation is challenging due to accuracy and reliability concerns. This study evaluates the performance of three fine-tuned generative models, ChatGPT, Google's Vertex AI, and ORNL Retrieval Augmented Generation augmented LLaMA 2 and LLaMA in retrieving regulatory information essential for hazardous material transportation compliance in the United States. Utilizing approximately 40 publicly available federal and state regulatory documents, we developed 100 realistic queries relevant to route planning and permitting requirements. Responses were qualitatively rated based on accuracy, detail, and relevance, complemented by quantitative assessments of semantic similarity between model outputs. Results demonstrated that the RAG-augmented LLaMA models significantly outperformed Vertex AI and ChatGPT, providing more detailed and generally accurate information, despite occasional inconsistencies. This research introduces the first known application of RAG in transportation safety, emphasizing the need for domain-specific fine-tuning and rigorous evaluation methodologies to ensure reliability and minimize the risk of inaccuracies in high-stakes environments.

View on arXiv PDF

Similar