CLNov 5, 2018

Improving Span-based Question Answering Systems with Coarsely Labeled Data

arXiv:1811.02076v11 citations
Originality Incremental advance
AI Analysis

This work addresses a specific bottleneck in span-based QA systems by leveraging coarsely labeled data, offering an incremental improvement over standard multi-task learning approaches.

The paper tackled the problem of improving fine-grained short answer question answering models by integrating coarse-grained data annotated for paragraph-level relevance, resulting in significant performance gains through latent-variable methods like posterior distillation.

We study approaches to improve fine-grained short answer Question Answering models by integrating coarse-grained data annotated for paragraph-level relevance and show that coarsely annotated data can bring significant performance gains. Experiments demonstrate that the standard multi-task learning approach of sharing representations is not the most effective way to leverage coarse-grained annotations. Instead, we can explicitly model the latent fine-grained short answer variables and optimize the marginal log-likelihood directly or use a newly proposed \emph{posterior distillation} learning objective. Since these latent-variable methods have explicit access to the relationship between the fine and coarse tasks, they result in significantly larger improvements from coarse supervision.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes