Revisiting Score Function Estimators for $k$-Subset Sampling
This addresses a fundamental bottleneck in machine learning tasks requiring k-subset sampling, offering a novel gradient estimator that works under weaker assumptions, though it is incremental in improving upon existing methods.
The paper tackled the problem of gradient-based optimization for k-subset sampling, which is non-differentiable, by revisiting score function estimators and showing how to compute them efficiently with a discrete Fourier transform and reduce variance using control variates. The result is an estimator that provides exact samples and unbiased gradients, applicable to non-differentiable models, with experiments in feature selection yielding competitive performance.
Are score function estimators an underestimated approach to learning with $k$-subset sampling? Sampling $k$-subsets is a fundamental operation in many machine learning tasks that is not amenable to differentiable parametrization, impeding gradient-based optimization. Prior work has focused on relaxed sampling or pathwise gradient estimators. Inspired by the success of score function estimators in variational inference and reinforcement learning, we revisit them within the context of $k$-subset sampling. Specifically, we demonstrate how to efficiently compute the $k$-subset distribution's score function using a discrete Fourier transform, and reduce the estimator's variance with control variates. The resulting estimator provides both exact samples and unbiased gradient estimates while also applying to non-differentiable downstream models, unlike existing methods. Experiments in feature selection show results competitive with current methods, despite weaker assumptions.