MLDec 15, 2024
Deep Learning-based Approaches for State Space Models: A Selective ReviewJiahe Lin, George Michailidis
State-space models (SSMs) offer a powerful framework for dynamical system analysis, wherein the temporal dynamics of the system are assumed to be captured through the evolution of the latent states, which govern the values of the observations. This paper provides a selective review of recent advancements in deep neural network-based approaches for SSMs, and presents a unified perspective for discrete time deep state space models and continuous time ones such as latent neural Ordinary Differential and Stochastic Differential Equations. It starts with an overview of the classical maximum likelihood based approach for learning SSMs, reviews variational autoencoder as a general learning pipeline for neural network-based approaches in the presence of latent variables, and discusses in detail representative deep learning models that fall under the SSM framework. Very recent developments, where SSMs are used as standalone architectural modules for improving efficiency in sequence modeling, are also examined. Finally, examples involving mixed frequency and irregularly-spaced time series data are presented to demonstrate the advantage of SSMs in these settings.
LGOct 9, 2025
Improving Reasoning for Diffusion Language Models via Group Diffusion Policy OptimizationKevin Rojas, Jiahe Lin, Kashif Rasul et al.
Diffusion language models (DLMs) enable parallel, order-agnostic generation with iterative refinement, offering a flexible alternative to autoregressive large language models (LLMs). However, adapting reinforcement learning (RL) fine-tuning to DLMs remains an open challenge because of the intractable likelihood. Pioneering work such as diffu-GRPO estimated token-level likelihoods via one-step unmasking. While computationally efficient, this approach is severely biased. A more principled foundation lies in sequence-level likelihoods, where the evidence lower bound (ELBO) serves as a surrogate. Yet, despite this clean mathematical connection, ELBO-based methods have seen limited adoption due to the prohibitive cost of likelihood evaluation. In this work, we revisit ELBO estimation and disentangle its sources of variance. This decomposition motivates reducing variance through fast, deterministic integral approximations along a few pivotal dimensions. Building on this insight, we introduce \textbf{Group Diffusion Policy Optimization (GDPO)}, a new RL algorithm tailored for DLMs. GDPO leverages simple yet effective Semi-deterministic Monte Carlo schemes to mitigate the variance explosion of ELBO estimators under vanilla double Monte Carlo sampling, yielding a provably lower-variance estimator under tight evaluation budgets. Empirically, GDPO achieves consistent gains over pretrained checkpoints and outperforms diffu-GRPO, one of the state-of-the-art baselines, on the majority of math, reasoning, and coding benchmarks.
LGJan 4, 2025
Reweighting Improves Conditional Risk BoundsYikai Zhang, Jiahe Lin, Fengpei Li et al.
In this work, we study the weighted empirical risk minimization (weighted ERM) schema, in which an additional data-dependent weight function is incorporated when the empirical risk function is being minimized. We show that under a general ``balanceable" Bernstein condition, one can design a weighted ERM estimator to achieve superior performance in certain sub-regions over the one obtained from standard ERM, and the superiority manifests itself through a data-dependent constant term in the error bound. These sub-regions correspond to large-margin ones in classification settings and low-variance ones in heteroscedastic regression settings, respectively. Our findings are supported by evidence from synthetic data experiments.
MLDec 15, 2024
Prediction-Enhanced Monte Carlo: A Machine Learning View on Control VariateFengpei Li, Haoxian Chen, Jiahe Lin et al.
For many complex simulation tasks spanning areas such as healthcare, engineering, and finance, Monte Carlo (MC) methods are invaluable due to their unbiased estimates and precise error quantification. Nevertheless, Monte Carlo simulations often become computationally prohibitive, especially for nested, multi-level, or path-dependent evaluations lacking effective variance reduction techniques. While machine learning (ML) surrogates appear as natural alternatives, naive replacements typically introduce unquantifiable biases. We address this challenge by introducing Prediction-Enhanced Monte Carlo (PEMC), a framework that leverages modern ML models as learned predictors, using cheap and parallelizable simulation as features, to output unbiased evaluation with reduced variance and runtime. PEMC can also be viewed as a "modernized" view of control variates, where we consider the overall computation-cost-aware variance reduction instead of per-replication reduction, while bypassing the closed-form mean function requirement and maintaining the advantageous unbiasedness and uncertainty quantifiability of Monte Carlo. We illustrate PEMC's broader efficacy and versatility through three examples: first, equity derivatives such as variance swaps under stochastic local volatility models; second, interest rate derivatives such as swaption pricing under the Heath-Jarrow-Morton (HJM) interest-rate model. Finally, we showcase PEMC in a socially significant context - ambulance dispatch and hospital load balancing - where accurate mortality rate estimates are key for ethically sensitive decision-making. Across these diverse scenarios, PEMC consistently reduces variance while preserving unbiasedness, highlighting its potential as a powerful enhancement to standard Monte Carlo baselines.
MLApr 23, 2025
Covariate-dependent Graphical Model Estimation via Neural Networks with Statistical GuaranteesJiahe Lin, Yikai Zhang, George Michailidis
Graphical models are widely used in diverse application domains to model the conditional dependencies amongst a collection of random variables. In this paper, we consider settings where the graph structure is covariate-dependent, and investigate a deep neural network-based approach to estimate it. The method allows for flexible functional dependency on the covariate, and fits the data reasonably well in the absence of a Gaussianity assumption. Theoretical results with PAC guarantees are established for the method, under assumptions commonly used in an Empirical Risk Minimization framework. The performance of the proposed method is evaluated on several synthetic data settings and benchmarked against existing approaches. The method is further illustrated on real datasets involving data from neuroscience and finance, respectively, and produces interpretable results.
LGFeb 25, 2024
A VAE-based Framework for Learning Multi-Level Neural Granger-Causal ConnectivityJiahe Lin, Huitian Lei, George Michailidis
Granger causality has been widely used in various application domains to capture lead-lag relationships amongst the components of complex dynamical systems, and the focus in extant literature has been on a single dynamical system. In certain applications in macroeconomics and neuroscience, one has access to data from a collection of related such systems, wherein the modeling task of interest is to extract the shared common structure that is embedded across them, as well as to identify the idiosyncrasies within individual ones. This paper introduces a Variational Autoencoder (VAE) based framework that jointly learns Granger-causal relationships amongst components in a collection of related-yet-heterogeneous dynamical systems, and handles the aforementioned task in a principled way. The performance of the proposed framework is evaluated on several synthetic data settings and benchmarked against existing approaches designed for individual system learning. The method is further illustrated on a real dataset involving time series data from a neurophysiological experiment and produces interpretable results.