CLLGApr 16, 2014

Open Question Answering with Weakly Supervised Embedding Models

arXiv:1404.4326v1353 citations
Originality Incremental advance
AI Analysis

This addresses the challenge of building scalable question-answering systems for any subject without extensive human labeling or domain-specific tools, though it is incremental as it builds on prior weakly supervised approaches.

The paper tackles the problem of open-domain question answering by learning to map questions and answers into a shared vector space, enabling schema-agnostic queries without grammars or lexicons, and achieves major improvements over the existing Paralex method using weakly supervised data.

Building computers able to answer questions on any subject is a long standing goal of artificial intelligence. Promising progress has recently been achieved by methods that learn to map questions to logical forms or database queries. Such approaches can be effective but at the cost of either large amounts of human-labeled data or by defining lexicons and grammars tailored by practitioners. In this paper, we instead take the radical approach of learning to map questions to vectorial feature representations. By mapping answers into the same space one can query any knowledge base independent of its schema, without requiring any grammar or lexicon. Our method is trained with a new optimization procedure combining stochastic gradient descent followed by a fine-tuning step using the weak supervision provided by blending automatically and collaboratively generated resources. We empirically demonstrate that our model can capture meaningful signals from its noisy supervision leading to major improvements over paralex, the only existing method able to be trained on similar weakly labeled data.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes