CLMar 18, 2022

Simulating Bandit Learning from User Feedback for Extractive Question Answering

arXiv:2203.10079v1646 citationsh-index: 36
Originality Incremental advance
AI Analysis

This addresses the data annotation bottleneck for question answering systems, though it is incremental as it builds on existing bandit learning and simulation methods.

The paper tackles the problem of reducing data annotation for extractive question answering by simulating user feedback as a contextual bandit learning problem, showing that systems trained on few examples can improve dramatically with feedback on predicted answers and can be deployed in new domains without annotation by improving on-the-fly.

We study learning from user feedback for extractive question answering by simulating feedback using supervised data. We cast the problem as contextual bandit learning, and analyze the characteristics of several learning scenarios with focus on reducing data annotation. We show that systems initially trained on a small number of examples can dramatically improve given feedback from users on model-predicted answers, and that one can use existing datasets to deploy systems in new domains without any annotation, but instead improving the system on-the-fly via user feedback.

Code Implementations1 repo
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes