Cambridge Yang

AI
h-index20
5papers
59citations
Novelty51%
AI Score47

5 Papers

LGMar 9, 2023
Computably Continuous Reinforcement-Learning Objectives are PAC-learnable

Cambridge Yang, Michael Littman, Michael Carbin · cambridge

In reinforcement learning, the classic objectives of maximizing discounted and finite-horizon cumulative rewards are PAC-learnable: There are algorithms that learn a near-optimal policy with high probability using a finite amount of samples and computation. In recent years, researchers have introduced objectives and corresponding reinforcement-learning algorithms beyond the classic cumulative rewards, such as objectives specified as linear temporal logic formulas. However, questions about the PAC-learnability of these new objectives have remained open. This work demonstrates the PAC-learnability of general reinforcement-learning objectives through sufficient conditions for PAC-learnability in two analysis settings. In particular, for the analysis that considers only sample complexity, we prove that if an objective given as an oracle is uniformly continuous, then it is PAC-learnable. Further, for the analysis that considers computational complexity, we prove that if an objective is computable, then it is PAC-learnable. In other words, if a procedure computes successive approximations of the objective's value, then the objective is PAC-learnable. We give three applications of our condition on objectives from the literature with previously unknown PAC-learnability and prove that these objectives are PAC-learnable. Overall, our result helps verify existing objectives' PAC-learnability. Also, as some studied objectives that are not uniformly continuous have been shown to be not PAC-learnable, our results could guide the design of new PAC-learnable objectives.

AISep 30, 2025
ExoPredicator: Learning Abstract Models of Dynamic Worlds for Robot Planning

Yichao Liang, Dat Nguyen, Cambridge Yang et al. · cambridge

Long-horizon embodied planning is challenging because the world does not only change through an agent's actions: exogenous processes (e.g., water heating, dominoes cascading) unfold concurrently with the agent's actions. We propose a framework for abstract world models that jointly learns (i) symbolic state representations and (ii) causal processes for both endogenous actions and exogenous mechanisms. Each causal process models the time course of a stochastic cause-effect relation. We learn these world models from limited data via variational Bayesian inference combined with LLM proposals. Across five simulated tabletop robotics environments, the learned models enable fast planning that generalizes to held-out tasks with more objects and more complex goals, outperforming a range of baselines.

AIOct 22, 2025
Benchmarking World-Model Learning

Archana Warrier, Dat Nguyen, Michelangelo Naim et al. · cambridge

Model-learning agents should gather information to learn world models that support many downstream tasks and inferences, such as predicting unobserved states, estimating near- and far-term consequences of actions, planning action sequences, and detecting changes in dynamics. Current methods for learning and evaluating world models diverge from this goal: training and evaluation are anchored to next-frame prediction, and success is scored by reward maximization in the same environment. We propose WorldTest, a protocol to evaluate model-learning agents that separates reward-free interaction from a scored test phase in a different but related environment. WorldTest is open-ended$\unicode{x2014}$models should support many different tasks unknown ahead of time$\unicode{x2014}$and agnostic to model representation, allowing comparison across approaches. We instantiated WorldTest with AutumnBench, a suite of 43 interactive grid-world environments and 129 tasks across three families: masked-frame prediction, planning, and predicting changes to the causal dynamics. We compared 517 human participants and three frontier models on AutumnBench. We found that humans outperform the models, and scaling compute improves performance only in some environments but not others. WorldTest provides a novel template$\unicode{x2014}$reward-free exploration, derived tests, and behavior-based scoring$\unicode{x2014}$to evaluate what agents learn about environment dynamics, and AutumnBench exposes significant headroom in world-model learning.

11.9LGMay 22
Archimedean Copula Inference via Taylor-Mode AD

Cambridge Yang, Dongdong Li · cambridge

No existing nested Archimedean copula tool handles all three of (a) arbitrary per-variable (right-)censoring in survival analysis, (b) arbitrary nesting trees, and (c) exact parameter gradients. Existing implementations handle only bivariate problems, low dimensional (i.e., $d \leq 10$) cases, two layers of nesting, or only hand-derived copula nestings. We present \textsc{acopula}, a JAX-native framework that, given any Archimedean generator -- classical or neural -- evaluates exact nested-copula likelihoods and parameter gradients under arbitrary censoring masks in polynomial time. The mechanism is polynomial powering of Taylor-mode automatic differentiation output, which replaces per-family hand-derived partial Bell polynomial tables with a single differentiable computation that any user-defined generator can drive. We conduct extensive simulations to verify the correctness of \textsc{acopula}. We then demonstrate (a) per-variable censoring on $85{,}229$ MIMIC-IV ICU admissions in high dimensions with $d{=}53$, fit by both classical Archimedean families and nested neural Archimedean copulas; (b) an 11-sector hierarchical model on S\&P~500 daily returns at $d{=}98$; (c) family-agnostic censored MLE across ten families, five of them with no prior implementation, on a retinopathy study; and (d) a ${\sim}650\times$ per-density speedup over R's \texttt{nacLL} at $d{=}35$, scaling quadratically to $d{=}8{,}000$.

AINov 24, 2021
On the (In)Tractability of Reinforcement Learning for LTL Objectives

Cambridge Yang, Michael Littman, Michael Carbin · cambridge

In recent years, researchers have made significant progress in devising reinforcement-learning algorithms for optimizing linear temporal logic (LTL) objectives and LTL-like objectives. Despite these advancements, there are fundamental limitations to how well this problem can be solved. Previous studies have alluded to this fact but have not examined it in depth. In this paper, we address the tractability of reinforcement learning for general LTL objectives from a theoretical perspective. We formalize the problem under the probably approximately correct learning in Markov decision processes (PAC-MDP) framework, a standard framework for measuring sample complexity in reinforcement learning. In this formalization, we prove that the optimal policy for any LTL formula is PAC-MDP-learnable if and only if the formula is in the most limited class in the LTL hierarchy, consisting of formulas that are decidable within a finite horizon. Practically, our result implies that it is impossible for a reinforcement-learning algorithm to obtain a PAC-MDP guarantee on the performance of its learned policy after finitely many interactions with an unconstrained environment for LTL objectives that are not decidable within a finite horizon.