CLOct 1, 2019

Dialogue Transformers

Vladimir Vlasov, Johannes E. M. Mosig, Alan Nichol

arXiv:1910.00486v33.763 citationsHas Code

Originality Synthesis-oriented

AI Analysis

This work addresses dialogue modeling for conversational AI systems, but appears incremental as it adapts an existing architecture (transformer) to a specific application.

The authors tackled the problem of encoding dialogue history by proposing a transformer-based dialogue policy that uses self-attention to selectively attend to relevant turns, addressing limitations of RNNs which assume all turns are equally important. They compared their TED policy against an LSTM and REDP, though no performance numbers were provided.

We introduce a dialogue policy based on a transformer architecture, where the self-attention mechanism operates over the sequence of dialogue turns. Recent work has used hierarchical recurrent neural networks to encode multiple utterances in a dialogue context, but we argue that a pure self-attention mechanism is more suitable. By default, an RNN assumes that every item in a sequence is relevant for producing an encoding of the full sequence, but a single conversation can consist of multiple overlapping discourse segments as speakers interleave multiple topics. A transformer picks which turns to include in its encoding of the current dialogue state, and is naturally suited to selectively ignoring or attending to dialogue history. We compare the performance of the Transformer Embedding Dialogue (TED) policy to an LSTM and to the REDP, which was specifically designed to overcome this limitation of RNNs.

View on arXiv PDF Code

Similar