MLOct 20, 2022
On Representations of Mean-Field Variational InferenceSoumyadip Ghosh, Yingdong Lu, Tomasz Nowicki et al.
The mean field variational inference (MFVI) formulation restricts the general Bayesian inference problem to the subspace of product measures. We present a framework to analyze MFVI algorithms, which is inspired by a similar development for general variational Bayesian formulations. Our approach enables the MFVI problem to be represented in three different manners: a gradient flow on Wasserstein space, a system of Fokker-Planck-like equations and a diffusion process. Rigorous guarantees are established to show that a time-discretized implementation of the coordinate ascent variational inference algorithm in the product Wasserstein space of measures yields a gradient flow in the limit. A similar result is obtained for their associated densities, with the limit being given by a quasi-linear partial differential equation. A popular class of practical algorithms falls in this framework, which provides tools to establish convergence. We hope this framework could be used to guarantee convergence of algorithms in a variety of approaches, old and new, to solve variational inference problems.
LGMay 7
Causal-Aware Foundation-Model for Bilevel Optimization in Discrete Choice SettingsShivaram Subramanian, Zhengliang Xue, Markus Ettl et al.
We introduce a causal aware foundation-model framework for real time optimal decision making in discrete choice environments. We propose a constrained triple-head price optimization (C3PO) network to solve a bilevel decision problem in which a service provider selects an optimal assortment while heterogeneous users make personalized acceptance or rejection choices optimizing their own personalized preferences. C3PO integrates imitation learning of prices, multi-task learning of revenue responses, and in context learning of price elasticity to generate pricing recommendations while adhering to business constraints. During inference, frontier model prompting retrieves an enhanced elasticity prior for new products from behavioral economics literature, improving pricing effectiveness. We demonstrate strong in context learning performance using simulated, synthetic, and real-world datasets. C3PO is trained on simulated data generated from multiple classical discrete choice models in economics. The model is trained on data comprising simulated customer segments and counterfactual action and outcome pairs and evaluated on randomly generated choice environments with no access to the underlying preference structure. The trained model consistently improves the pricing KPIs, with gains increasing as customer price sensitivity increases. We also deploy the tuned foundation model for optimal pricing in real-world applications such as healthcare, tender pricing, airline ancillary pricing, and other domains, achieving substantial gains across multiple products, markets, and divisions.
LGAug 8, 2025
In-Context Reinforcement Learning via Communicative World ModelsFernando Martinez-Lopez, Tao Li, Yingdong Lu et al.
Reinforcement learning (RL) agents often struggle to generalize to new tasks and contexts without updating their parameters, mainly because their learned representations and policies are overfit to the specifics of their training environments. To boost agents' in-context RL (ICRL) ability, this work formulates ICRL as a two-agent emergent communication problem and introduces CORAL (Communicative Representation for Adaptive RL), a framework that learns a transferable communicative context by decoupling latent representation learning from control. In CORAL, an Information Agent (IA) is pre-trained as a world model on a diverse distribution of tasks. Its objective is not to maximize task reward, but to build a world model and distill its understanding into concise messages. The emergent communication protocol is shaped by a novel Causal Influence Loss, which measures the effect that the message has on the next action. During deployment, the previously trained IA serves as a fixed contextualizer for a new Control Agent (CA), which learns to solve tasks by interpreting the provided communicative context. Our experiments demonstrate that this approach enables the CA to achieve significant gains in sample efficiency and successfully perform zero-shot adaptation with the help of pre-trained IA in entirely unseen sparse-reward environments, validating the efficacy of learning a transferable communicative representation.
LGOct 21, 2025
Optimality and NP-Hardness of Transformers in Learning Markovian Dynamical FunctionsYanna Ding, Songtao Lu, Yingdong Lu et al.
Transformer architectures can solve unseen tasks based on input-output pairs in a given prompt due to in-context learning (ICL). Existing theoretical studies on ICL have mainly focused on linear regression tasks, often with i.i.d. inputs. To understand how transformers express ICL when modeling dynamics-driven functions, we investigate Markovian function learning through a structured ICL setup, where we characterize the loss landscape to reveal underlying optimization behaviors. Specifically, we (1) provide the closed-form expression of the global minimizer (in an enlarged parameter space) for a single-layer linear self-attention (LSA) model; (2) prove that recovering transformer parameters that realize the optimal solution is NP-hard in general, revealing a fundamental limitation of one-layer LSA in representing structured dynamical functions; and (3) supply a novel interpretation of a multilayer LSA as performing preconditioned gradient descent to optimize multiple objectives beyond the square loss. These theoretical results are numerically validated using simplified transformers.
LGSep 22, 2025
Fast Linear Solvers via AI-Tuned Markov Chain Monte Carlo-based Matrix InversionAnton Lebedev, Won Kyung Lee, Soumyadip Ghosh et al.
Large, sparse linear systems are pervasive in modern science and engineering, and Krylov subspace solvers are an established means of solving them. Yet convergence can be slow for ill-conditioned matrices, so practical deployments usually require preconditioners. Markov chain Monte Carlo (MCMC)-based matrix inversion can generate such preconditioners and accelerate Krylov iterations, but its effectiveness depends on parameters whose optima vary across matrices; manual or grid search is costly. We present an AI-driven framework recommending MCMC parameters for a given linear system. A graph neural surrogate predicts preconditioning speed from $A$ and MCMC parameters. A Bayesian acquisition function then chooses the parameter sets most likely to minimise iterations. On a previously unseen ill-conditioned system, the framework achieves better preconditioning with 50\% of the search budget of conventional methods, yielding about a 10\% reduction in iterations to convergence. These results suggest a route for incorporating MCMC-based preconditioners into large-scale systems.
LGAug 10, 2025
Stackelberg Coupling of Online Representation Learning and Reinforcement LearningFernando Martinez, Tao Li, Yingdong Lu et al.
Deep Q-learning jointly learns representations and values within monolithic networks, promising beneficial co-adaptation between features and value estimates. Although this architecture has attained substantial success, the coupling between representation and value learning creates instability as representations must constantly adapt to non-stationary value targets, while value estimates depend on these shifting representations. This is compounded by high variance in bootstrapped targets, which causes bias in value estimation in off-policy methods. We introduce Stackelberg Coupled Representation and Reinforcement Learning (SCORER), a framework for value-based RL that views representation and Q-learning as two strategic agents in a hierarchical game. SCORER models the Q-function as the leader, which commits to its strategy by updating less frequently, while the perception network (encoder) acts as the follower, adapting more frequently to learn representations that minimize Bellman error variance given the leader's committed strategy. Through this division of labor, the Q-function minimizes MSBE while perception minimizes its variance, thereby reducing bias accordingly, with asymmetric updates allowing stable co-adaptation, unlike simultaneous parameter updates in monolithic solutions. Our proposed SCORER framework leads to a bi-level optimization problem whose solution is approximated by a two-timescale algorithm that creates an asymmetric learning dynamic between the two players. Extensive experiments on DQN and its variants demonstrate that gains stem from algorithmic insight rather than model complexity.
AIFeb 20, 2025
SPRIG: Stackelberg Perception-Reinforcement Learning with Internal Game DynamicsFernando Martinez-Lopez, Juntao Chen, Yingdong Lu
Deep reinforcement learning agents often face challenges to effectively coordinate perception and decision-making components, particularly in environments with high-dimensional sensory inputs where feature relevance varies. This work introduces SPRIG (Stackelberg Perception-Reinforcement learning with Internal Game dynamics), a framework that models the internal perception-policy interaction within a single agent as a cooperative Stackelberg game. In SPRIG, the perception module acts as a leader, strategically processing raw sensory states, while the policy module follows, making decisions based on extracted features. SPRIG provides theoretical guarantees through a modified Bellman operator while preserving the benefits of modern policy optimization. Experimental results on the Atari BeamRider environment demonstrate SPRIG's effectiveness, achieving around 30% higher returns than standard PPO through its game-theoretical balance of feature extraction and decision-making.
AINov 12, 2024
Federated Learning for Discrete Optimal Transport with Large Population under Incomplete InformationNavpreet Kaur, Juntao Chen, Yingdong Lu
Optimal transport is a powerful framework for the efficient allocation of resources between sources and targets. However, traditional models often struggle to scale effectively in the presence of large and heterogeneous populations. In this work, we introduce a discrete optimal transport framework designed to handle large-scale, heterogeneous target populations, characterized by type distributions. We address two scenarios: one where the type distribution of targets is known, and one where it is unknown. For the known distribution, we propose a fully distributed algorithm to achieve optimal resource allocation. In the case of unknown distribution, we develop a federated learning-based approach that enables efficient computation of the optimal transport scheme while preserving privacy. Case studies are provided to evaluate the performance of our learning algorithm.
STMay 21, 2024
On Convergence of the Alternating Directions SGHMC AlgorithmSoumyadip Ghosh, Yingdong Lu, Tomasz Nowicki
We study convergence rates of Hamiltonian Monte Carlo (HMC) algorithms with leapfrog integration under mild conditions on stochastic gradient oracle for the target distribution (SGHMC). Our method extends standard HMC by allowing the use of general auxiliary distributions, which is achieved by a novel procedure of Alternating Directions. The convergence analysis is based on the investigations of the Dirichlet forms associated with the underlying Markov chain driving the algorithms. For this purpose, we provide a detailed analysis on the error of the leapfrog integrator for Hamiltonian motions with both the kinetic and potential energy functions in general form. We characterize the explicit dependence of the convergence rates on key parameters such as the problem dimension, functional properties of both the target and auxiliary distributions, and the quality of the oracle.
FAFeb 4, 2022
Polynomial convergence of iterations of certain random operators in Hilbert spaceSoumyadip Ghosh, Yingdong Lu, Tomasz J. Nowicki
We study the convergence of a random iterative sequence of a family of operators on infinite dimensional Hilbert spaces, inspired by the Stochastic Gradient Descent (SGD) algorithm in the case of the noiseless regression, as studied in [1]. We identify conditions that are strictly broader than previously known for polynomial convergence rate in various norms, and characterize the roles the randomness plays in determining the best multiplicative constants. Additionally, we prove almost sure convergence of the sequence.
MLOct 21, 2021
Hamiltonian Monte Carlo with Asymmetrical Momentum DistributionsSoumyadip Ghosh, Yingdong Lu, Tomasz Nowicki
Existing rigorous convergence guarantees for the Hamiltonian Monte Carlo (HMC) algorithm use Gaussian auxiliary momentum variables, which are crucially symmetrically distributed. We present a novel convergence analysis for HMC utilizing new analytic and probabilistic arguments. The convergence is rigorously established under significantly weaker conditions, which among others allow for general auxiliary distributions. In our framework, we show that plain HMC with asymmetrical momentum distributions breaks a key self-adjointness requirement. We propose a modified version that we call the Alternating Direction HMC (AD-HMC). Sufficient conditions are established under which AD-HMC exhibits geometric convergence in Wasserstein distance. Numerical experiments suggest that AD-HMC can show improved performance over HMC with Gaussian auxiliaries.
COFeb 4, 2021
HMC, an Algorithms in Data Mining, the Functional Analysis approachSoumyadip Ghosh, Yingdong Lu, Tomasz Nowicki
The main purpose of this paper is to facilitate the communication between the Analytic, Probabilistic and Algorithmic communities. We present a proof of convergence of the Hamiltonian (Hybrid) Monte Carlo algorithm from the point of view of the Dynamical Systems, where the evolving objects are densities of probability distributions and the tool are derived from the Functional Analysis.
CAJan 21, 2021
On $L^q$ Convergence of the Hamiltonian Monte CarloSoumyadip Ghosh, Yingdong Lu, Tomasz Nowicki
We establish $L_q$ convergence for Hamiltonian Monte Carlo algorithms. More specifically, under mild conditions for the associated Hamiltonian motion, we show that the outputs of the algorithms converge (strongly for $2\le q<\infty$ and weakly for $1<q<2$) to the desired target distribution.
LGMay 28, 2019
A General Markov Decision Process Framework for Directly Learning Optimal Control PoliciesYingdong Lu, Mark S. Squillante, Chai Wah Wu
We consider a new form of reinforcement learning (RL) that is based on opportunities to directly learn the optimal control policy and a general Markov decision process (MDP) framework devised to support these opportunities. Derivations of general classes of our control-based RL methods are presented, together with forms of exploration and exploitation in learning and applying the optimal control policy over time. Our general MDP framework extends the classical Bellman operator and optimality criteria by generalizing the definition and scope of a policy for any given state. We establish the convergence and optimality-both in general and within various control paradigms (e.g., piecewise linear control policies)-of our control-based methods through this general MDP framework, including convergence of $Q$-learning within the context of our MDP framework. Our empirical results demonstrate and quantify the significant benefits of our approach.
MLMay 21, 2018
A General Family of Robust Stochastic Operators for Reinforcement LearningYingdong Lu, Mark S. Squillante, Chai Wah Wu
We consider a new family of operators for reinforcement learning with the goal of alleviating the negative effects and becoming more robust to approximation or estimation errors. Various theoretical results are established, which include showing on a sample path basis that our family of operators preserve optimality and increase the action gap. Our empirical results illustrate the strong benefits of our family of operators, significantly outperforming the classical Bellman operator and recently proposed operators.