LGMay 22, 2022
Near-Optimal Algorithms for Autonomous Exploration and Multi-Goal Stochastic Shortest PathHaoyuan Cai, Tengyu Ma, Simon Du · stanford
We revisit the incremental autonomous exploration problem proposed by Lim & Auer (2012). In this setting, the agent aims to learn a set of near-optimal goal-conditioned policies to reach the $L$-controllable states: states that are incrementally reachable from an initial state $s_0$ within $L$ steps in expectation. We introduce a new algorithm with stronger sample complexity bounds than existing ones. Furthermore, we also prove the first lower bound for the autonomous exploration problem. In particular, the lower bound implies that our proposed algorithm, Value-Aware Autonomous Exploration, is nearly minimax-optimal when the number of $L$-controllable states grows polynomially with respect to $L$. Key in our algorithm design is a connection between autonomous exploration and multi-goal stochastic shortest path, a new problem that naturally generalizes the classical stochastic shortest path problem. This new problem and its connection to autonomous exploration can be of independent interest.
AIJun 10, 2025Code
Robot-Gated Interactive Imitation Learning with Adaptive Intervention MechanismHaoyuan Cai, Zhenghao Peng, Bolei Zhou
Interactive Imitation Learning (IIL) allows agents to acquire desired behaviors through human interventions, but current methods impose high cognitive demands on human supervisors. We propose the Adaptive Intervention Mechanism (AIM), a novel robot-gated IIL algorithm that learns an adaptive criterion for requesting human demonstrations. AIM utilizes a proxy Q-function to mimic the human intervention rule and adjusts intervention requests based on the alignment between agent and human actions. By assigning high Q-values when the agent deviates from the expert and decreasing these values as the agent becomes proficient, the proxy Q-function enables the agent to assess the real-time alignment with the expert and request assistance when needed. Our expert-in-the-loop experiments reveal that AIM significantly reduces expert monitoring efforts in both continuous and discrete control tasks. Compared to the uncertainty-based baseline Thrifty-DAgger, our method achieves a 40% improvement in terms of human take-over cost and learning efficiency. Furthermore, AIM effectively identifies safety-critical states for expert assistance, thereby collecting higher-quality expert demonstrations and reducing overall expert data and environment interactions needed. Code and demo video are available at https://github.com/metadriverse/AIM.
LGApr 30
High-Probability Convergence in Decentralized Stochastic Optimization with Gradient TrackingAleksandar Armacki, Haoyuan Cai, Ali H. Sayed
We study high-probability (HP) convergence guarantees in decentralized stochastic optimization, where multiple agents collaborate to jointly train a model over a network. Existing HP results in decentralized settings almost exclusively focus on the Decentralized Stochastic Gradient Descent ($\mathtt{DSGD}$) algorithm, which requires strong assumptions, such as bounded data heterogeneity, or strong convexity of each agent's cost. This is contrary to the mean-squared error (MSE) results, where methods incorporating bias-correction techniques are known to converge under relaxed assumptions and achieve better practical performance. In this paper we provide the first step toward bridging the gap, by studying HP convergence of $\mathtt{DSGD}$ incorporating the gradient tracking technique, in the presence of noise satisfying a relaxed sub-Gaussian condition. We show that the resulting method, dubbed $\mathtt{GT-DSGD}$, achieves order-optimal HP convergence rates for both non-convex and Polyak-Łojasiewicz costs, of order $\mathcal{O}\Big(\frac{\log(1/δ)}{\sqrt{nT}}\Big)$ and $\mathcal{O}\Big(\frac{\log(1/δ)}{nT}\Big)$, respectively, where $n$ is the number of agents, $T$ is the time horizon and $δ\in (0,1)$ is the confidence parameter. Our results establish that $\mathtt{GT-DSGD}$ converges in the HP sense under the same conditions on the cost as in the MSE sense, while achieving comparable transient times. To the best of our knowledge, these are the first HP guarantees for decentralized optimization methods incorporating bias-correction. Numerical experiments on real and synthetic data verify our theoretical findings, underlining the superior performance of $\mathtt{GT-DSGD}$ and highlighting that the benefits of incorporating bias-correction are also maintained in the HP sense.
LGOct 2, 2025
Predictive Preference Learning from Human InterventionsHaoyuan Cai, Zhenghao Peng, Bolei Zhou
Learning from human involvement aims to incorporate the human subject to monitor and correct agent behavior errors. Although most interactive imitation learning methods focus on correcting the agent's action at the current state, they do not adjust its actions in future states, which may be potentially more hazardous. To address this, we introduce Predictive Preference Learning from Human Interventions (PPL), which leverages the implicit preference signals contained in human interventions to inform predictions of future rollouts. The key idea of PPL is to bootstrap each human intervention into L future time steps, called the preference horizon, with the assumption that the agent follows the same action and the human makes the same intervention in the preference horizon. By applying preference optimization on these future states, expert corrections are propagated into the safety-critical regions where the agent is expected to explore, significantly improving learning efficiency and reducing human demonstrations needed. We evaluate our approach with experiments on both autonomous driving and robotic manipulation benchmarks and demonstrate its efficiency and generality. Our theoretical analysis further shows that selecting an appropriate preference horizon L balances coverage of risky states with label correctness, thereby bounding the algorithmic optimality gap. Demo and code are available at: https://metadriverse.github.io/ppl
MAApr 28, 2025
Diffusion Stochastic Learning Over Adaptive Competing NetworksYike Zhao, Haoyuan Cai, Ali H. Sayed
This paper studies a stochastic dynamic game between two competing teams, each consisting of a network of collaborating agents. Unlike fully cooperative settings, where all agents share a common objective, each team in this game aims to minimize its own distinct objective. In the adversarial setting, their objectives could be conflicting as in zero-sum games. Throughout the competition, agents share strategic information within their own team while simultaneously inferring and adapting to the strategies of the opposing team. We propose diffusion learning algorithms to address two important classes of this network game: i) a zero-sum game characterized by weak cross-team subgraph interactions, and ii) a general non-zero-sum game exhibiting strong cross-team subgraph interactions. We analyze the stability performance of the proposed algorithms under reasonable assumptions and illustrate the theoretical results through experiments on Cournot team competition and decentralized GAN training.
LGJun 18, 2024
Accelerated Stochastic Min-Max Optimization Based on Bias-corrected MomentumHaoyuan Cai, Sulaiman A. Alghunaim, Ali H. Sayed
Lower-bound analyses for nonconvex strongly-concave minimax optimization problems have shown that stochastic first-order algorithms require at least $\mathcal{O}(\varepsilon^{-4})$ oracle complexity to find an $\varepsilon$-stationary point. Some works indicate that this complexity can be improved to $\mathcal{O}(\varepsilon^{-3})$ when the loss gradient is Lipschitz continuous. The question of achieving enhanced convergence rates under distinct conditions, remains unresolved. In this work, we address this question for optimization problems that are nonconvex in the minimization variable and strongly concave or Polyak-Lojasiewicz (PL) in the maximization variable. We introduce novel bias-corrected momentum algorithms utilizing efficient Hessian-vector products. We establish convergence conditions and demonstrate a lower iteration complexity of $\mathcal{O}(\varepsilon^{-3})$ for the proposed algorithms. The effectiveness of the method is validated through applications to robust logistic regression using real-world datasets.
LGJan 26, 2024
Diffusion Stochastic Optimization for Min-Max ProblemsHaoyuan Cai, Sulaiman A. Alghunaim, Ali H. Sayed
The optimistic gradient method is useful in addressing minimax optimization problems. Motivated by the observation that the conventional stochastic version suffers from the need for a large batch size on the order of $\mathcal{O}(\varepsilon^{-2})$ to achieve an $\varepsilon$-stationary solution, we introduce and analyze a new formulation termed Diffusion Stochastic Same-Sample Optimistic Gradient (DSS-OG). We prove its convergence and resolve the large batch issue by establishing a tighter upper bound, under the more general setting of nonconvex Polyak-Lojasiewicz (PL) risk functions. We also extend the applicability of the proposed method to the distributed scenario, where agents communicate with their neighbors via a left-stochastic protocol. To implement DSS-OG, we can query the stochastic gradient oracles in parallel with some extra memory overhead, resulting in a complexity comparable to its conventional counterpart. To demonstrate the efficacy of the proposed algorithm, we conduct tests by training generative adversarial networks.
QUANT-PHJul 19, 2021
Sample Complexity of Learning Parametric Quantum CircuitsHaoyuan Cai, Qi Ye, Dong-Ling Deng
Quantum computers hold unprecedented potentials for machine learning applications. Here, we prove that physical quantum circuits are PAC (probably approximately correct) learnable on a quantum computer via empirical risk minimization: to learn a parametric quantum circuit with at most $n^c$ gates and each gate acting on a constant number of qubits, the sample complexity is bounded by $\tilde{O}(n^{c+1})$. In particular, we explicitly construct a family of variational quantum circuits with $O(n^{c+1})$ elementary gates arranged in a fixed pattern, which can represent all physical quantum circuits consisting of at most $n^c$ elementary gates. Our results provide a valuable guide for quantum machine learning in both theory and practice.