CLMay 11

RUBEN: Rule-Based Explanations for Retrieval-Augmented LLM Systems

Joel Rorseth, Parke Godfrey, Lukasz Golab, Divesh Srivastava, Jarek Szlichta

arXiv:2605.1086248.3

AI Analysis

For developers and auditors of retrieval-augmented LLM systems, RUBEN offers a novel method to interpret model behavior and assess safety, though the paper focuses on tool demonstration without quantitative benchmarks.

RUBEN provides an interactive tool to extract minimal rules explaining outputs of retrieval-augmented LLMs, enabling efficient identification of rule sets and demonstrating applications in testing LLM safety and adversarial robustness.

This paper demonstrates RUBEN, an interactive tool for discovering minimal rules to explain the outputs of retrieval-augmented large language models (LLMs) in data-driven applications. We leverage novel pruning strategies to efficiently identify a minimal set of rules that subsume all others. We further demonstrate novel applications of these rules for LLM safety, specifically to test the resiliency of safety training and effectiveness of adversarial prompt injections.

View on arXiv PDF

Similar