CLAIJun 12, 2024

Making Task-Oriented Dialogue Datasets More Natural by Synthetically Generating Indirect User Requests

arXiv:2406.07794v221 citations
Originality Synthesis-oriented
AI Analysis

This addresses the problem of smaller models struggling with indirect requests in virtual assistants, though it is incremental as it builds on existing datasets and methods.

The paper tackled the lack of realistic indirect user requests in task-oriented dialogue datasets by proposing an LLM-based pipeline to synthetically generate them, resulting in the release of the IndirectRequests dataset based on the SGD corpus for evaluating smaller models.

Indirect User Requests (IURs), such as "It's cold in here" instead of "Could you please increase the temperature?" are common in human-human task-oriented dialogue and require world knowledge and pragmatic reasoning from the listener. While large language models (LLMs) can handle these requests effectively, smaller models deployed on virtual assistants often struggle due to resource constraints. Moreover, existing task-oriented dialogue benchmarks lack sufficient examples of complex discourse phenomena such as indirectness. To address this, we propose a set of linguistic criteria along with an LLM-based pipeline for generating realistic IURs to test natural language understanding (NLU) and dialogue state tracking (DST) models before deployment in a new domain. We also release IndirectRequests, a dataset of IURs based on the Schema Guided Dialog (SGD) corpus, as a comparative testbed for evaluating the performance of smaller models in handling indirect requests.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes