Let Me At Least Learn What You Really Like: Dealing With Noisy Humans When Learning Preferences
This work addresses the challenge of noisy human feedback in preference learning, which is incremental as it builds on existing active learning methods.
The paper tackles the problem of learning human preferences efficiently under query limitations by modifying uncertainty sampling with expected output values to accelerate learning, achieving faster convergence compared to the baseline.
Learning the preferences of a human improves the quality of the interaction with the human. The number of queries available to learn preferences maybe limited especially when interacting with a human, and so active learning is a must. One approach to active learning is to use uncertainty sampling to decide the informativeness of a query. In this paper, we propose a modification to uncertainty sampling which uses the expected output value to help speed up learning of preferences. We compare our approach with the uncertainty sampling baseline, as well as conduct an ablation study to test the validity of each component of our approach.