Sampling in Constrained Domains with Orthogonal-Space Variational Gradient Descent
This addresses a bottleneck in applying sampling methods to real-life machine learning problems with constraints, offering a novel approach for efficient inference in domains like Bayesian deep neural networks.
The paper tackles the problem of sampling in constrained domains, which is challenging due to implicitly-defined manifolds from constraints like safety and fairness, by proposing a new variational framework with orthogonal-space gradient flow (O-Gradient) that converges to the target distribution without requiring initialization on the manifold, achieving a convergence rate of O~(1/iterations).
Sampling methods, as important inference and learning techniques, are typically designed for unconstrained domains. However, constraints are ubiquitous in machine learning problems, such as those on safety, fairness, robustness, and many other properties that must be satisfied to apply sampling results in real-life applications. Enforcing these constraints often leads to implicitly-defined manifolds, making efficient sampling with constraints very challenging. In this paper, we propose a new variational framework with a designed orthogonal-space gradient flow (O-Gradient) for sampling on a manifold $\mathcal{G}_0$ defined by general equality constraints. O-Gradient decomposes the gradient into two parts: one decreases the distance to $\mathcal{G}_0$ and the other decreases the KL divergence in the orthogonal space. While most existing manifold sampling methods require initialization on $\mathcal{G}_0$, O-Gradient does not require such prior knowledge. We prove that O-Gradient converges to the target constrained distribution with rate $\widetilde{O}(1/\text{the number of iterations})$ under mild conditions. Our proof relies on a new Stein characterization of conditional measure which could be of independent interest. We implement O-Gradient through both Langevin dynamics and Stein variational gradient descent and demonstrate its effectiveness in various experiments, including Bayesian deep neural networks.