Semantic Matching Against a Corpus: New Applications and Methods
This work provides a proof-of-concept for semantic matching applications, addressing a specific need for emergency managers, but it is incremental as it builds on existing entailment methods.
The paper tackles the problem of semantically matching natural language propositions against a corpus, introducing a new task for domain experts like emergency managers, and finds that a model based on natural language entailment data outperforms simple word-vector averaging in a user study on disaster recovery queries.
We consider the case of a domain expert who wishes to explore the extent to which a particular idea is expressed in a text collection. We propose the task of semantically matching the idea, expressed as a natural language proposition, against a corpus. We create two preliminary tasks derived from existing datasets, and then introduce a more realistic one on disaster recovery designed for emergency managers, whom we engaged in a user study. On the latter, we find that a new model built from natural language entailment data produces higher-quality matches than simple word-vector averaging, both on expert-crafted queries and on ones produced by the subjects themselves. This work provides a proof-of-concept for such applications of semantic matching and illustrates key challenges.