AUEB-Archimedes at RIRAG-2025: Is obligation concatenation really all you need?
This work addresses regulatory question answering for legal or compliance domains, but it is incremental as it builds on existing retrieval and evaluation methods.
The paper tackled the problem of answering regulatory questions by retrieving passages and generating answers, achieving a high but dubious score of 0.947 with direct extraction and a more plausible score of 0.639 with iterative refinement. It used retrieval models and a reranker, exploiting a neural component to extract obligations for evaluation.
This paper presents the systems we developed for RIRAG-2025, a shared task that requires answering regulatory questions by retrieving relevant passages. The generated answers are evaluated using RePASs, a reference-free and model-based metric. Our systems use a combination of three retrieval models and a reranker. We show that by exploiting a neural component of RePASs that extracts important sentences ('obligations') from the retrieved passages, we achieve a dubiously high score (0.947), even though the answers are directly extracted from the retrieved passages and are not actually generated answers. We then show that by selecting the answer with the best RePASs among a few generated alternatives and then iteratively refining this answer by reducing contradictions and covering more obligations, we can generate readable, coherent answers that achieve a more plausible and relatively high score (0.639).