CLMar 18, 2022

Simulating Bandit Learning from User Feedback for Extractive Question Answering

arXiv:2203.10079v132.1646 citationsh-index: 36Has Code

Originality Incremental advance

AI Analysis

This addresses the data annotation bottleneck for question answering systems, though it is incremental as it builds on existing bandit learning and simulation methods.

The paper tackles the problem of reducing data annotation for extractive question answering by simulating user feedback as a contextual bandit learning problem, showing that systems trained on few examples can improve dramatically with feedback on predicted answers and can be deployed in new domains without annotation by improving on-the-fly.

We study learning from user feedback for extractive question answering by simulating feedback using supervised data. We cast the problem as contextual bandit learning, and analyze the characteristics of several learning scenarios with focus on reducing data annotation. We show that systems initially trained on a small number of examples can dramatically improve given feedback from users on model-predicted answers, and that one can use existing datasets to deploy systems in new domains without any annotation, but instead improving the system on-the-fly via user feedback.

View on arXiv PDF Code

Similar