Neural Dialogue State Tracking with Temporally Expressive Networks
This work addresses a specific bottleneck in dialogue systems for incremental improvement in state tracking.
The authors tackled the problem of dialogue state tracking by jointly modeling temporal feature and state dependencies, resulting in improved accuracy for turn-level state prediction and state aggregation on standard datasets.
Dialogue state tracking (DST) is an important part of a spoken dialogue system. Existing DST models either ignore temporal feature dependencies across dialogue turns or fail to explicitly model temporal state dependencies in a dialogue. In this work, we propose Temporally Expressive Networks (TEN) to jointly model the two types of temporal dependencies in DST. The TEN model utilizes the power of recurrent networks and probabilistic graphical models. Evaluating on standard datasets, TEN is demonstrated to be effective in improving the accuracy of turn-level-state prediction and the state aggregation.