CL AI IRSep 24, 2025

DyBBT: Dynamic Balance via Bandit inspired Targeting for Dialog Policy with Cognitive Dual-Systems

Shuyu Zhang, Yifan Wei, Jialuo Yuan, Xinru Wang, Yanmin Zhu, Bin Li

arXiv:2509.19695v1Has Code

Originality Incremental advance

AI Analysis

This addresses inefficient exploration in task-oriented dialog systems, offering a novel method for dynamic adaptation, though it appears incremental as it builds on existing cognitive dual-system concepts.

The paper tackles the problem of inefficient exploration in task-oriented dialog systems by proposing DyBBT, a framework that dynamically switches between fast intuitive and slow deliberative reasoning based on real-time cognitive states. It achieves state-of-the-art performance in success rate, efficiency, and generalization on benchmarks, with human evaluations confirming alignment with expert judgment.

Task oriented dialog systems often rely on static exploration strategies that do not adapt to dynamic dialog contexts, leading to inefficient exploration and suboptimal performance. We propose DyBBT, a novel dialog policy learning framework that formalizes the exploration challenge through a structured cognitive state space capturing dialog progression, user uncertainty, and slot dependency. DyBBT proposes a bandit inspired meta-controller that dynamically switches between a fast intuitive inference (System 1) and a slow deliberative reasoner (System 2) based on real-time cognitive states and visitation counts. Extensive experiments on single- and multi-domain benchmarks show that DyBBT achieves state-of-the-art performance in success rate, efficiency, and generalization, with human evaluations confirming its decisions are well aligned with expert judgment. Code is available at https://github.com/carsonz/DyBBT.

View on arXiv PDF Code

Similar