Evaluating Semantic Parsing against a Simple Web-based Question Answering Model
This work provides a critical evaluation method for semantic parsing in question answering, though it is incremental as it focuses on benchmarking rather than advancing the field.
The paper tackles the problem of evaluating semantic parsing models by comparing them to a simple web-based question answering baseline, finding that the baseline achieves 35 F1 compared to 41 F1 for state-of-the-art on the COMPLEXQUESTIONS dataset.
Semantic parsing shines at analyzing complex natural language that involves composition and computation over multiple pieces of evidence. However, datasets for semantic parsing contain many factoid questions that can be answered from a single web document. In this paper, we propose to evaluate semantic parsing-based question answering models by comparing them to a question answering baseline that queries the web and extracts the answer only from web snippets, without access to the target knowledge-base. We investigate this approach on COMPLEXQUESTIONS, a dataset designed to focus on compositional language, and find that our model obtains reasonable performance (35 F1 compared to 41 F1 of state-of-the-art). We find in our analysis that our model performs well on complex questions involving conjunctions, but struggles on questions that involve relation composition and superlatives.