Transparent NLP: Using RAG and LLM Alignment for Privacy Q&A
This study addresses the problem of transparency in NLP systems for organizations subject to GDPR, providing an incremental step towards integrating advanced NLP systems into legal compliance frameworks.
The study tackled the problem of achieving transparency in NLP systems to fulfill GDPR obligations, and found that RAG systems with an alignment module outperform baseline RAG systems on most metrics, with no system fully matching human answers. The results showed improvement across 21 metrics, though the extent of improvement is not specified.
The transparency principle of the General Data Protection Regulation (GDPR) requires data processing information to be clear, precise, and accessible. While language models show promise in this context, their probabilistic nature complicates truthfulness and comprehensibility. This paper examines state-of-the-art Retrieval Augmented Generation (RAG) systems enhanced with alignment techniques to fulfill GDPR obligations. We evaluate RAG systems incorporating an alignment module like Rewindable Auto-regressive Inference (RAIN) and our proposed multidimensional extension, MultiRAIN, using a Privacy Q&A dataset. Responses are optimized for preciseness and comprehensibility and are assessed through 21 metrics, including deterministic and large language model-based evaluations. Our results show that RAG systems with an alignment module outperform baseline RAG systems on most metrics, though none fully match human answers. Principal component analysis of the results reveals complex interactions between metrics, highlighting the need to refine metrics. This study provides a foundation for integrating advanced natural language processing systems into legal compliance frameworks.