Post-processing Networks: Method for Optimizing Pipeline Task-oriented Dialogue Systems using Reinforcement Learning
This work addresses a practical problem for developers of task-oriented dialogue systems by enabling optimization of non-differentiable modules, though it is incremental as it builds on existing reinforcement learning approaches.
The authors tackled the limitation of existing reinforcement learning methods that only work with neural-based modules in pipeline task-oriented dialogue systems by proposing post-processing networks (PPNs) to optimize systems with arbitrary modules, showing improved dialogue performance in simulations and human evaluations on the MultiWOZ dataset.
Many studies have proposed methods for optimizing the dialogue performance of an entire pipeline task-oriented dialogue system by jointly training modules in the system using reinforcement learning. However, these methods are limited in that they can only be applied to modules implemented using trainable neural-based methods. To solve this problem, we propose a method for optimizing a pipeline system composed of modules implemented with arbitrary methods for dialogue performance. With our method, neural-based components called post-processing networks (PPNs) are installed inside such a system to post-process the output of each module. All PPNs are updated to improve the overall dialogue performance of the system by using reinforcement learning, not necessitating each module to be differentiable. Through dialogue simulation and human evaluation on the MultiWOZ dataset, we show that our method can improve the dialogue performance of pipeline systems consisting of various modules.