CL AI IR LGMay 2, 2020

AVA: an Automatic eValuation Approach to Question Answering Systems

arXiv:2005.00705v11.716 citations

Originality Incremental advance

AI Analysis

This provides an automatic evaluation tool for question answering systems, which is incremental as it builds on existing Transformer models and benchmarks.

The paper tackles the problem of automatically evaluating question answering systems by introducing AVA, a Transformer-based approach that estimates system accuracy by measuring similarity between reference and automatic answers. The method achieves up to 74.7% F1 score in predicting human judgment for single answers and can evaluate overall system accuracy with RMSE ranging from 0.02 to 0.09.

We introduce AVA, an automatic evaluation approach for Question Answering, which given a set of questions associated with Gold Standard answers, can estimate system Accuracy. AVA uses Transformer-based language models to encode question, answer, and reference text. This allows for effectively measuring the similarity between the reference and an automatic answer, biased towards the question semantics. To design, train and test AVA, we built multiple large training, development, and test sets on both public and industrial benchmarks. Our innovative solutions achieve up to 74.7% in F1 score in predicting human judgement for single answers. Additionally, AVA can be used to evaluate the overall system Accuracy with an RMSE, ranging from 0.02 to 0.09, depending on the availability of multiple references.

View on arXiv PDF

Similar