Multi-Source Test-Time Adaptation as Dueling Bandits for Extractive Question Answering
This work addresses efficient model adaptation in extractive question answering, though it appears incremental as it builds on existing bandit frameworks with a new collaborative method.
The paper tackles multi-source test-time model adaptation using user feedback by framing it as a stochastic decision-making process, comparing multi-armed bandit and dueling bandit frameworks, with the latter enhanced by a novel Co-UCB method showing improved effectiveness on six extractive QA datasets.
In this work, we study multi-source test-time model adaptation from user feedback, where K distinct models are established for adaptation. To allow efficient adaptation, we cast the problem as a stochastic decision-making process, aiming to determine the best adapted model after adaptation. We discuss two frameworks: multi-armed bandit learning and multi-armed dueling bandits. Compared to multi-armed bandit learning, the dueling framework allows pairwise collaboration among K models, which is solved by a novel method named Co-UCB proposed in this work. Experiments on six datasets of extractive question answering (QA) show that the dueling framework using Co-UCB is more effective than other strong baselines for our studied problem.