CLSep 5, 2023

Dialog Action-Aware Transformer for Dialog Policy Learning

arXiv:2309.02240v1191 citationsh-index: 17
Originality Incremental advance
AI Analysis

This work addresses the need for faster training in dialog systems, which is incremental as it builds on existing RL methods with a novel fine-tuning approach.

The paper tackles the problem of slow learning in dialog policy learning by leveraging pre-trained language model knowledge to accelerate reinforcement learning agents, achieving improved efficiency in both simulator and human evaluations.

Recent works usually address Dialog policy learning DPL by training a reinforcement learning (RL) agent to determine the best dialog action. However, existing works on deep RL require a large volume of agent-user interactions to achieve acceptable performance. In this paper, we propose to make full use of the plain text knowledge from the pre-trained language model to accelerate the RL agent's learning speed. Specifically, we design a dialog action-aware transformer encoder (DaTrans), which integrates a new fine-tuning procedure named masked last action task to encourage DaTrans to be dialog-aware and distils action-specific features. Then, DaTrans is further optimized in an RL setting with ongoing interactions and evolves through exploration in the dialog action space toward maximizing long-term accumulated rewards. The effectiveness and efficiency of the proposed model are demonstrated with both simulator evaluation and human evaluation.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes