CL AIMay 18, 2022

Addressing Resource and Privacy Constraints in Semantic Parsing Through Data Augmentation

Kevin Yang, Olivia Deng, Charles Chen, Richard Shin, Subhro Roy, Benjamin Van Durme

arXiv:2205.08675v132.0640 citationsh-index: 60

Originality Incremental advance

AI Analysis

This addresses resource and privacy constraints in semantic parsing for task-oriented systems, but it is incremental as it builds on existing data augmentation methods.

The paper tackled the problem of low-resource semantic parsing under constraints like lack of related data, inability to sample logical forms, and privacy requirements, by using data augmentation with canonical utterances and filtering, resulting in a 33% relative improvement in top-1 match on the SMCalFlow dataset.

We introduce a novel setup for low-resource task-oriented semantic parsing which incorporates several constraints that may arise in real-world scenarios: (1) lack of similar datasets/models from a related domain, (2) inability to sample useful logical forms directly from a grammar, and (3) privacy requirements for unlabeled natural utterances. Our goal is to improve a low-resource semantic parser using utterances collected through user interactions. In this highly challenging but realistic setting, we investigate data augmentation approaches involving generating a set of structured canonical utterances corresponding to logical forms, before simulating corresponding natural language and filtering the resulting pairs. We find that such approaches are effective despite our restrictive setup: in a low-resource setting on the complex SMCalFlow calendaring dataset (Andreas et al., 2020), we observe 33% relative improvement over a non-data-augmented baseline in top-1 match.

View on arXiv PDF

Similar