Active Query Synthesis for Preference Learning
For practitioners of preference learning in interactive systems, this work addresses the dual challenges of computational cost and unreliable feedback in active learning, offering a more efficient and robust query selection method.
This paper introduces Info-Synth, an active query synthesis framework for preference learning that generates optimal queries by maximizing mutual information in continuous space, overcoming the computational bottleneck of pool-based methods. It also proposes a confidence-aware response model to handle ambiguous comparisons, achieving improved efficiency and accuracy across synthetic, text summary, and robot control tasks.
Efficient learning of user preferences is crucial for many modern decision making systems but typically requires costly labeled data. Active learning reduces this cost, yet standard methods are computationally expensive due to pool-based evaluation. Further, most methods assume all query feedback is equally reliable, ignoring that pairwise queries between nearly identical or entirely dissimilar items yield ambiguous, low-confidence responses. To address the issue of feedback reliability, we introduce a novel confidence aware response model that explicitly accounts for these ambiguous comparisons. To overcome the computational bottleneck of pool-based evaluation, we propose an active query synthesis framework, Info-Synth that generates optimal queries by maximizing a mutual information-based objective within a continuous space. Moreover, we propose two strategies, Pair M-dist and Pair Opt-dist, that extend Info-Synth to select effective queries even when restricted to finite query pools. We demonstrate our framework's versatility and performance across synthetic preference learning, constrained text summary datasets, and subjective, continuous-space controller gain tuning for a simulated mobile robot.