Xuhui Cai

h-index6
2papers

2 Papers

AINov 6, 2025
DMA: Online RAG Alignment with Human Feedback

Yu Bai, Yukai Miao, Dawei Wang et al.

Retrieval-augmented generation (RAG) systems often rely on static retrieval, limiting adaptation to evolving intent and content drift. We introduce Dynamic Memory Alignment (DMA), an online learning framework that systematically incorporates multi-granularity human feedback to align ranking in interactive settings. DMA organizes document-, list-, and response-level signals into a coherent learning pipeline: supervised training for pointwise and listwise rankers, policy optimization driven by response-level preferences, and knowledge distillation into a lightweight scorer for low-latency serving. Throughout this paper, memory refers to the model's working memory, which is the entire context visible to the LLM for In-Context Learning. We adopt a dual-track evaluation protocol mirroring deployment: (i) large-scale online A/B ablations to isolate the utility of each feedback source, and (ii) few-shot offline tests on knowledge-intensive benchmarks. Online, a multi-month industrial deployment further shows substantial improvements in human engagement. Offline, DMA preserves competitive foundational retrieval while yielding notable gains on conversational QA (TriviaQA, HotpotQA). Taken together, these results position DMA as a principled approach to feedback-driven, real-time adaptation in RAG without sacrificing baseline capability.

IRJun 21, 2024
Pistis-RAG: Enhancing Retrieval-Augmented Generation with Human Feedback

Yu Bai, Yukai Miao, Li Chen et al.

RAG systems face limitations when semantic relevance alone does not guarantee improved generation quality. This issue becomes particularly evident due to the sensitivity of large language models (LLMs) to the ordering of few-shot prompts, which can affect model performance. To address this challenge, aligning LLM outputs with human preferences using structured feedback, such as options to copy, regenerate, or dislike, offers a promising method for improvement. This feedback is applied to the entire list of inputs rather than giving specific ratings for individual documents, making it a Listwide Labels Learning-to-Rank task. To address this task, we propose Pistis-RAG, a new RAG framework designed with a content-centric approach to better align LLMs with human preferences. Pistis-RAG effectively utilizes human feedback, enhancing content ranking and generation quality. To validate our framework, we use public datasets to simulate human feedback, allowing us to evaluate and refine our method effectively. Experimental results indicate that Pistis-RAG improves alignment with human preferences relative to the baseline RAG system, showing a 6.06% increase in MMLU (English) and a 7.08% increase in C-EVAL (Chinese) accuracy metrics. These results highlight Pistis-RAG's effectiveness in overcoming the limitations associated with traditional RAG approaches.