Qijia Jiang

LG
h-index6
6papers
145citations
Novelty63%
AI Score38

6 Papers

MLJun 2, 2025
Machine-Learned Sampling of Conditioned Path Measures

Qijia Jiang, Reuben Cohn-Gordon

We propose algorithms for sampling from posterior path measures $P(C([0, T], \mathbb{R}^d))$ under a general prior process. This leverages ideas from (1) controlled equilibrium dynamics, which gradually transport between two path measures, and (2) optimization in $\infty$-dimensional probability space endowed with a Wasserstein metric, which can be used to evolve a density curve under the specified likelihood. The resulting algorithms are theoretically grounded and can be integrated seamlessly with neural networks for learning the target trajectory ensembles, without access to data.

MLMay 22, 2024
Control, Transport and Sampling: Towards Better Loss Design

Qijia Jiang, David Nabergoj

Leveraging connections between diffusion-based sampling, optimal transport, and stochastic optimal control through their shared links to the Schrödinger bridge problem, we propose novel objective functions that can be used to transport $ν$ to $μ$, consequently sample from the target $μ$, via optimally controlled dynamics. We highlight the importance of the pathwise perspective and the role various optimality conditions on the path measure can play for the design of valid training losses, the careful choice of which offer numerical advantages in implementation. Basing the formalism on Schrödinger bridge comes with the additional practical capability of baking in inductive bias when it comes to Neural Network training.

LGJun 8, 2020
Learning the Truth From Only One Side of the Story

Heinrich Jiang, Qijia Jiang, Aldo Pacchiano

Learning under one-sided feedback (i.e., where we only observe the labels for examples we predicted positively on) is a fundamental problem in machine learning -- applications include lending and recommendation systems. Despite this, there has been surprisingly little progress made in ways to mitigate the effects of the sampling bias that arises. We focus on generalized linear models and show that without adjusting for this sampling bias, the model may converge suboptimally or even fail to converge to the optimal solution. We propose an adaptive approach that comes with theoretical guarantees and show that it outperforms several existing methods empirically. Our method leverages variance estimation techniques to efficiently learn under uncertainty, offering a more principled alternative compared to existing approaches.

LGFeb 20, 2020
Optimizing Black-box Metrics with Adaptive Surrogates

Qijia Jiang, Olaoluwa Adigun, Harikrishna Narasimhan et al.

We address the problem of training models with black-box and hard-to-optimize metrics by expressing the metric as a monotonic function of a small number of easy-to-optimize surrogates. We pose the training problem as an optimization over a relaxed surrogate space, which we solve by estimating local gradients for the metric and performing inexact convex projections. We analyze gradient estimates based on finite differences and local linear interpolations, and show convergence of our approach under smoothness assumptions with respect to the surrogates. Experimental results on classification and ranking problems verify the proposal performs on par with methods that know the mathematical formulation, and adds notable value when the form of the metric is unknown.

OCJun 25, 2019
Complexity of Highly Parallel Non-Smooth Convex Optimization

Sébastien Bubeck, Qijia Jiang, Yin Tat Lee et al.

A landmark result of non-smooth convex optimization is that gradient descent is an optimal algorithm whenever the number of computed gradients is smaller than the dimension $d$. In this paper we study the extension of this result to the parallel optimization setting. Namely we consider optimization algorithms interacting with a highly parallel gradient oracle, that is one that can answer $\mathrm{poly}(d)$ gradient queries in parallel. We show that in this case gradient descent is optimal only up to $\tilde{O}(\sqrt{d})$ rounds of interactions with the oracle. The lower bound improves upon a decades old construction by Nemirovski which proves optimality only up to $d^{1/3}$ rounds (as recently observed by Balkanski and Singer), and the suboptimality of gradient descent after $\sqrt{d}$ rounds was already observed by Duchi, Bartlett and Wainwright. In the latter regime we propose a new method with improved complexity, which we conjecture to be optimal. The analysis of this new method is based upon a generalized version of the recent results on optimal acceleration for highly smooth convex optimization.

LGOct 25, 2018
Subgradient Descent Learns Orthogonal Dictionaries

Yu Bai, Qijia Jiang, Ju Sun

This paper concerns dictionary learning, i.e., sparse coding, a fundamental representation learning problem. We show that a subgradient descent algorithm, with random initialization, can provably recover orthogonal dictionaries on a natural nonsmooth, nonconvex $\ell_1$ minimization formulation of the problem, under mild statistical assumptions on the data. This is in contrast to previous provable methods that require either expensive computation or delicate initialization schemes. Our analysis develops several tools for characterizing landscapes of nonsmooth functions, which might be of independent interest for provable training of deep networks with nonsmooth activations (e.g., ReLU), among numerous other applications. Preliminary experiments corroborate our analysis and show that our algorithm works well empirically in recovering orthogonal dictionaries.