CL AINov 1, 2024

DARD: A Multi-Agent Approach for Task-Oriented Dialog Systems

Aman Gupta, Anirudh Ravichandran, Ziji Zhang, Swair Shah, Anurag Beniwal, Narayanan Sadagopan

arXiv:2411.00427v14.87 citationsh-index: 19

Originality Incremental advance

AI Analysis

This addresses the problem of handling diverse user intents and domains in dialogue systems for applications like customer service, though it appears incremental as it builds on existing multi-agent and modeling techniques.

The paper tackled the challenge of developing effective multi-domain task-oriented dialogue systems by proposing DARD, a multi-agent approach that achieved state-of-the-art performance on the MultiWOZ benchmark, improving dialogue inform rate by 6.6% and success rate by 4.1% over existing methods.

Task-oriented dialogue systems are essential for applications ranging from customer service to personal assistants and are widely used across various industries. However, developing effective multi-domain systems remains a significant challenge due to the complexity of handling diverse user intents, entity types, and domain-specific knowledge across several domains. In this work, we propose DARD (Domain Assigned Response Delegation), a multi-agent conversational system capable of successfully handling multi-domain dialogs. DARD leverages domain-specific agents, orchestrated by a central dialog manager agent. Our extensive experiments compare and utilize various agent modeling approaches, combining the strengths of smaller fine-tuned models (Flan-T5-large & Mistral-7B) with their larger counterparts, Large Language Models (LLMs) (Claude Sonnet 3.0). We provide insights into the strengths and limitations of each approach, highlighting the benefits of our multi-agent framework in terms of flexibility and composability. We evaluate DARD using the well-established MultiWOZ benchmark, achieving state-of-the-art performance by improving the dialogue inform rate by 6.6% and the success rate by 4.1% over the best-performing existing approaches. Additionally, we discuss various annotator discrepancies and issues within the MultiWOZ dataset and its evaluation system.

View on arXiv PDF

Similar