CLJun 17, 2025

Expectation Confirmation Preference Optimization for Multi-Turn Conversational Recommendation Agent

arXiv:2506.14302v14 citationsh-index: 12ACL
Originality Highly original
AI Analysis

This addresses the challenge of sustaining user guidance in conversational recommendation systems, representing an incremental advancement in multi-turn dialogue optimization.

The paper tackles the problem of short-sighted responses in multi-turn conversational recommendation agents by introducing ECPO, a novel multi-turn preference optimization paradigm that models user satisfaction evolution using Expectation Confirmation Theory, resulting in significant improvements in efficiency and effectiveness over existing methods.

Recent advancements in Large Language Models (LLMs) have significantly propelled the development of Conversational Recommendation Agents (CRAs). However, these agents often generate short-sighted responses that fail to sustain user guidance and meet expectations. Although preference optimization has proven effective in aligning LLMs with user expectations, it remains costly and performs poorly in multi-turn dialogue. To address this challenge, we introduce a novel multi-turn preference optimization (MTPO) paradigm ECPO, which leverages Expectation Confirmation Theory to explicitly model the evolution of user satisfaction throughout multi-turn dialogues, uncovering the underlying causes of dissatisfaction. These causes can be utilized to support targeted optimization of unsatisfactory responses, thereby achieving turn-level preference optimization. ECPO ingeniously eliminates the significant sampling overhead of existing MTPO methods while ensuring the optimization process drives meaningful improvements. To support ECPO, we introduce an LLM-based user simulator, AILO, to simulate user feedback and perform expectation confirmation during conversational recommendations. Experimental results show that ECPO significantly enhances CRA's interaction capabilities, delivering notable improvements in both efficiency and effectiveness over existing MTPO methods.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes