CLDec 1, 2019

Machines Getting with the Program: Understanding Intent Arguments of Non-Canonical Directives

arXiv:1912.00342v2994 citations
Originality Synthesis-oriented
AI Analysis

This addresses the problem of low-resource language comprehension for dialog systems, though it is incremental as it focuses on corpus creation and imbalance mitigation.

The paper tackles the challenge of dialog agents understanding non-canonical speech forms, such as paraphrased directives, by proposing guidelines to build a parallel corpus and constructing a Korean dataset of 50K question/command-intent pairs to improve intent extraction.

Modern dialog managers face the challenge of having to fulfill human-level conversational skills as part of common user expectations, including but not limited to discourse with no clear objective. Along with these requirements, agents are expected to extrapolate intent from the user's dialogue even when subjected to non-canonical forms of speech. This depends on the agent's comprehension of paraphrased forms of such utterances. Especially in low-resource languages, the lack of data is a bottleneck that prevents advancements of the comprehension performance for these types of agents. In this regard, here we demonstrate the necessity of extracting the intent argument of non-canonical directives in a natural language format, which may yield more accurate parsing, and suggest guidelines for building a parallel corpus for this purpose. Following the guidelines, we construct a Korean corpus of 50K instances of question/command-intent pairs, including the labels for classification of the utterance type. We also propose a method for mitigating class imbalance, demonstrating the potential applications of the corpus generation method and its multilingual extensibility.

Code Implementations1 repo
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes