Binary Search with Distributional Predictions
This work addresses a gap in algorithms with predictions by enabling distributional inputs, which is crucial for modern neural networks that output distributions, but it is incremental as it focuses on the specific setting of binary search.
The paper tackles the problem of binary search when predictions are probabilistic distributions rather than point estimates, showing that classical algorithms perform poorly with distributions. It presents an algorithm achieving query complexity O(H(p) + log η), where H(p) is the entropy of the true distribution and η is the earth mover's distance to the predicted distribution, and proves this is optimal up to constants.
Algorithms with (machine-learned) predictions is a powerful framework for combining traditional worst-case algorithms with modern machine learning. However, the vast majority of work in this space assumes that the prediction itself is non-probabilistic, even if it is generated by some stochastic process (such as a machine learning system). This is a poor fit for modern ML, particularly modern neural networks, which naturally generate a distribution. We initiate the study of algorithms with distributional predictions, where the prediction itself is a distribution. We focus on one of the simplest yet fundamental settings: binary search (or searching a sorted array). This setting has one of the simplest algorithms with a point prediction, but what happens if the prediction is a distribution? We show that this is a richer setting: there are simple distributions where using the classical prediction-based algorithm with any single prediction does poorly. Motivated by this, as our main result, we give an algorithm with query complexity $O(H(p) + \log η)$, where $H(p)$ is the entropy of the true distribution $p$ and $η$ is the earth mover's distance between $p$ and the predicted distribution $\hat p$. This also yields the first distributionally-robust algorithm for the classical problem of computing an optimal binary search tree given a distribution over target keys. We complement this with a lower bound showing that this query complexity is essentially optimal (up to constants), and experiments validating the practical usefulness of our algorithm.