CLNov 5, 2018

Improving Span-based Question Answering Systems with Coarsely Labeled Data

Hao Cheng, Ming-Wei Chang, Kenton Lee, Ankur Parikh, Michael Collins, Kristina Toutanova

arXiv:1811.02076v10.51 citations

Originality Incremental advance

AI Analysis

This work addresses a specific bottleneck in span-based QA systems by leveraging coarsely labeled data, offering an incremental improvement over standard multi-task learning approaches.

The paper tackled the problem of improving fine-grained short answer question answering models by integrating coarse-grained data annotated for paragraph-level relevance, resulting in significant performance gains through latent-variable methods like posterior distillation.

We study approaches to improve fine-grained short answer Question Answering models by integrating coarse-grained data annotated for paragraph-level relevance and show that coarsely annotated data can bring significant performance gains. Experiments demonstrate that the standard multi-task learning approach of sharing representations is not the most effective way to leverage coarse-grained annotations. Instead, we can explicitly model the latent fine-grained short answer variables and optimize the marginal log-likelihood directly or use a newly proposed \emph{posterior distillation} learning objective. Since these latent-variable methods have explicit access to the relationship between the fine and coarse tasks, they result in significantly larger improvements from coarse supervision.

View on arXiv PDF

Similar