CLOct 15, 2021

DS-TOD: Efficient Domain Specialization for Task Oriented Dialog

arXiv:2110.08395v230 citations
Originality Incremental advance
AI Analysis

This work addresses the need for efficient domain specialization in task-oriented dialog systems, offering a modular and resource-efficient solution that is particularly beneficial for multi-domain applications, though it is incremental as it builds on existing pretraining and adapter methods.

The paper tackled the problem of embedding domain-specific knowledge in pretrained language models for task-oriented dialog, showing that their DS-TOD framework with domain adapters improves performance on dialog state tracking and response retrieval tasks across five domains in MultiWOZ, with adapter-based specialization matching full fine-tuning in single-domain setups and offering better performance in multi-domain setups.

Recent work has shown that self-supervised dialog-specific pretraining on large conversational datasets yields substantial gains over traditional language modeling (LM) pretraining in downstream task-oriented dialog (TOD). These approaches, however, exploit general dialogic corpora (e.g., Reddit) and thus presumably fail to reliably embed domain-specific knowledge useful for concrete downstream TOD domains. In this work, we investigate the effects of domain specialization of pretrained language models (PLMs) for TOD. Within our DS-TOD framework, we first automatically extract salient domain-specific terms, and then use them to construct DomainCC and DomainReddit -- resources that we leverage for domain-specific pretraining, based on (i) masked language modeling (MLM) and (ii) response selection (RS) objectives, respectively. We further propose a resource-efficient and modular domain specialization by means of domain adapters -- additional parameter-light layers in which we encode the domain knowledge. Our experiments with prominent TOD tasks -- dialog state tracking (DST) and response retrieval (RR) -- encompassing five domains from the MultiWOZ benchmark demonstrate the effectiveness of DS-TOD. Moreover, we show that the light-weight adapter-based specialization (1) performs comparably to full fine-tuning in single domain setups and (2) is particularly suitable for multi-domain specialization, where besides advantageous computational footprint, it can offer better TOD performance.

Code Implementations1 repo
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes