RIRAG: Regulatory Information Retrieval and Answer Generation
This work addresses the problem of legal compliance for organizations by simplifying access to regulatory rules, though it is incremental as it builds on existing regulatory NLP efforts.
The paper tackles the challenge of interpreting complex and frequently updated regulatory documents by introducing a task for generating question-passage pairs to aid regulatory question-answering systems, resulting in the creation of the ObliQA dataset with 27,869 questions and a baseline RIRAG system evaluated using a novel metric.
Regulatory documents, issued by governmental regulatory bodies, establish rules, guidelines, and standards that organizations must adhere to for legal compliance. These documents, characterized by their length, complexity and frequent updates, are challenging to interpret, requiring significant allocation of time and expertise on the part of organizations to ensure ongoing compliance. Regulatory Natural Language Processing (RegNLP) is a multidisciplinary field aimed at simplifying access to and interpretation of regulatory rules and obligations. We introduce a task of generating question-passages pairs, where questions are automatically created and paired with relevant regulatory passages, facilitating the development of regulatory question-answering systems. We create the ObliQA dataset, containing 27,869 questions derived from the collection of Abu Dhabi Global Markets (ADGM) financial regulation documents, design a baseline Regulatory Information Retrieval and Answer Generation (RIRAG) system and evaluate it with RePASs, a novel evaluation metric that tests whether generated answers accurately capture all relevant obligations while avoiding contradictions.