Score-Based Density Estimation from Pairwise Comparisons
This addresses the challenge of expert knowledge elicitation and learning from human feedback, providing a method for density estimation from limited pairwise data.
The paper tackles the problem of estimating a target density from pairwise comparisons by relating it to a tempered winner density and learning the score via score-matching, enabling estimation of complex multivariate densities with only hundreds to thousands of comparisons.
We study density estimation from pairwise comparisons, motivated by expert knowledge elicitation and learning from human feedback. We relate the unobserved target density to a tempered winner density (marginal density of preferred choices), learning the winner's score via score-matching. This allows estimating the target by `de-tempering' the estimated winner density's score. We prove that the score vectors of the belief and the winner density are collinear, linked by a position-dependent tempering field. We give analytical formulas for this field and propose an estimator for it under the Bradley-Terry model. Using a diffusion model trained on tempered samples generated via score-scaled annealed Langevin dynamics, we can learn complex multivariate belief densities of simulated experts, from only hundreds to thousands of pairwise comparisons.