Zhaolin Ren

LG
h-index10
10papers
96citations
Novelty52%
AI Score35

10 Papers

LGSep 8, 2022
FedDAR: Federated Domain-Aware Representation Learning

Aoxiao Zhong, Hao He, Zhaolin Ren et al.

Cross-silo Federated learning (FL) has become a promising tool in machine learning applications for healthcare. It allows hospitals/institutions to train models with sufficient data while the data is kept private. To make sure the FL model is robust when facing heterogeneous data among FL clients, most efforts focus on personalizing models for clients. However, the latent relationships between clients' data are ignored. In this work, we focus on a special non-iid FL problem, called Domain-mixed FL, where each client's data distribution is assumed to be a mixture of several predefined domains. Recognizing the diversity of domains and the similarity within domains, we propose a novel method, FedDAR, which learns a domain shared representation and domain-wise personalized prediction heads in a decoupled manner. For simplified linear regression settings, we have theoretically proved that FedDAR enjoys a linear convergence rate. For general settings, we have performed intensive empirical studies on both synthetic and real-world medical datasets which demonstrate its superiority over prior FL methods.

LGSep 9, 2024
Enhancing Preference-based Linear Bandits via Human Response Time

Shen Li, Yuyang Zhang, Zhaolin Ren et al.

Interactive preference learning systems infer human preferences by presenting queries as pairs of options and collecting binary choices. Although binary choices are simple and widely used, they provide limited information about preference strength. To address this, we leverage human response times, which are inversely related to preference strength, as an additional signal. We propose a computationally efficient method that combines choices and response times to estimate human utility functions, grounded in the EZ diffusion model from psychology. Theoretical and empirical analyses show that for queries with strong preferences, response times complement choices by providing extra information about preference strength, leading to significantly improved utility estimation. We incorporate this estimator into preference-based linear bandits for fixed-budget best-arm identification. Simulations on three real-world datasets demonstrate that using response times significantly accelerates preference learning compared to choice-only approaches. Additional materials, such as code, slides, and talk video, are available at https://shenlirobot.github.io/pages/NeurIPS24.html

LGApr 8, 2023
Stochastic Nonlinear Control via Finite-dimensional Spectral Dynamic Embedding

Zhaolin Ren, Tongzheng Ren, Haitong Ma et al.

This paper proposes an approach, Spectral Dynamics Embedding Control (SDEC), to optimal control for nonlinear stochastic systems. This method reveals an infinite-dimensional feature representation induced by the system's nonlinear stochastic dynamics, enabling a linear representation of the state-action value function. For practical implementation, this representation is approximated using finite-dimensional truncations, specifically via two prominent kernel approximation methods: random feature truncation and Nystrom approximation. To characterize the effectiveness of these approximations, we provide an in-depth theoretical analysis to characterize the approximation error arising from the finite-dimension truncation and statistical error due to finite-sample approximation in both policy evaluation and policy optimization. Empirically, our algorithm performs favorably against existing stochastic control algorithms on several benchmark problems.

LGApr 7, 2024
Skill Transfer and Discovery for Sim-to-Real Learning: A Representation-Based Viewpoint

Haitong Ma, Zhaolin Ren, Bo Dai et al.

We study sim-to-real skill transfer and discovery in the context of robotics control using representation learning. We draw inspiration from spectral decomposition of Markov decision processes. The spectral decomposition brings about representation that can linearly represent the state-action value function induced by any policies, thus can be regarded as skills. The skill representations are transferable across arbitrary tasks with the same transition dynamics. Moreover, to handle the sim-to-real gap in the dynamics, we propose a skill discovery algorithm that learns new skills caused by the sim-to-real gap from real-world data. We promote the discovery of new skills by enforcing orthogonal constraints between the skills to learn and the skills from simulators, and then synthesize the policy using the enlarged skill sets. We demonstrate our methodology by transferring quadrotor controllers from simulators to Crazyflie 2.1 quadrotors. We show that we can learn the skill representations from a single simulator task and transfer these to multiple different real-world tasks including hovering, taking off, landing and trajectory tracking. Our skill discovery approach helps narrow the sim-to-real gap and improve the real-world controller performance by up to 30.2%.

LGOct 21, 2024
Distributed Thompson sampling under constrained communication

Saba Zerefa, Zhaolin Ren, Haitong Ma et al.

In Bayesian optimization, a black-box function is maximized via the use of a surrogate model. We apply distributed Thompson sampling, using a Gaussian process as a surrogate model, to approach the multi-agent Bayesian optimization problem. In our distributed Thompson sampling implementation, each agent receives sampled points from neighbors, where the communication network is encoded in a graph; each agent utilizes their own Gaussian process to model the objective function. We demonstrate theoretical bounds on Bayesian average regret and Bayesian simple regret, where the bound depends on the structure of the communication graph. Unlike in batch Bayesian optimization, this bound is applicable in cases where the communication graph amongst agents is constrained. When compared to sequential single-agent Thompson sampling, our bound guarantees faster convergence with respect to time as long as the communication graph is connected. We confirm the efficacy of our algorithm with numerical simulations on traditional optimization test functions, demonstrating the significance of graph connectivity on improving regret convergence.

LGMar 7, 2024
TS-RSR: A provably efficient approach for batch Bayesian Optimization

Zhaolin Ren, Na Li

This paper presents a new approach for batch Bayesian Optimization (BO) called Thompson Sampling-Regret to Sigma Ratio directed sampling (TS-RSR), where we sample a new batch of actions by minimizing a Thompson Sampling approximation of a regret to uncertainty ratio. Our sampling objective is able to coordinate the actions chosen in each batch in a way that minimizes redundancy between points whilst focusing on points with high predictive means or high uncertainty. Theoretically, we provide rigorous convergence guarantees on our algorithm's regret, and numerically, we demonstrate that our method attains state-of-the-art performance on a range of challenging synthetic and realistic test functions, where it outperforms several competitive benchmark batch BO algorithms.

ROAug 20, 2025
Offline Imitation Learning upon Arbitrary Demonstrations by Pre-Training Dynamics Representations

Haitong Ma, Bo Dai, Zhaolin Ren et al.

Limited data has become a major bottleneck in scaling up offline imitation learning (IL). In this paper, we propose enhancing IL performance under limited expert data by introducing a pre-training stage that learns dynamics representations, derived from factorizations of the transition dynamics. We first theoretically justify that the optimal decision variable of offline IL lies in the representation space, significantly reducing the parameters to learn in the downstream IL. Moreover, the dynamics representations can be learned from arbitrary data collected with the same dynamics, allowing the reuse of massive non-expert data and mitigating the limited data issues. We present a tractable loss function inspired by noise contrastive estimation to learn the dynamics representations at the pre-training stage. Experiments on MuJoCo demonstrate that our proposed algorithm can mimic expert policies with as few as a single trajectory. Experiments on real quadrupeds show that we can leverage pre-trained dynamics representations from simulator data to learn to walk from a few real-world demonstrations.

MAOct 22, 2024
Scalable spectral representations for multi-agent reinforcement learning in network MDPs

Zhaolin Ren, Runyu Zhang, Bo Dai et al.

Network Markov Decision Processes (MDPs), a popular model for multi-agent control, pose a significant challenge to efficient learning due to the exponential growth of the global state-action space with the number of agents. In this work, utilizing the exponential decay property of network dynamics, we first derive scalable spectral local representations for network MDPs, which induces a network linear subspace for the local $Q$-function of each agent. Building on these local spectral representations, we design a scalable algorithmic framework for continuous state-action network MDPs, and provide end-to-end guarantees for the convergence of our algorithm. Empirically, we validate the effectiveness of our scalable representation-based approach on two benchmark problems, and demonstrate the advantages of our approach over generic function approximation approaches to representing the local $Q$-functions.

LGJun 1, 2021
Gradient play in stochastic games: stationary points, convergence, and sample complexity

Runyu Zhang, Zhaolin Ren, Na Li

We study the performance of the gradient play algorithm for stochastic games (SGs), where each agent tries to maximize its own total discounted reward by making decisions independently based on current state information which is shared between agents. Policies are directly parameterized by the probability of choosing a certain action at a given state. We show that Nash equilibria (NEs) and first-order stationary policies are equivalent in this setting, and give a local convergence rate around strict NEs. Further, for a subclass of SGs called Markov potential games (which includes the setting with identical rewards as an important special case), we design a sample-based reinforcement learning algorithm and give a non-asymptotic global convergence rate analysis for both exact gradient play and our sample-based learning algorithm. Our result shows that the number of iterations to reach an $ε$-NE scales linearly, instead of exponentially, with the number of agents. Local geometry and local stability are also considered, where we prove that strict NEs are local maxima of the total potential function and fully-mixed NEs are saddle points.

OCNov 3, 2020
LQR with Tracking: A Zeroth-order Approach and Its Global Convergence

Zhaolin Ren, Aoxiao Zhong, Na Li

There has been substantial recent progress on the theoretical understanding of model-free approaches to Linear Quadratic Regulator (LQR) problems. Much attention has been devoted to the special case when the goal is to drive the state close to a zero target. In this work, we consider the general case where the target is allowed to be arbitrary, which we refer to as the LQR tracking problem. We study the optimization landscape of this problem, and show that similar to the zero-target LQR problem, the LQR tracking problem also satisfies gradient dominance and local smoothness properties. This allows us to develop a zeroth-order policy gradient algorithm that achieves global convergence. We support our arguments with numerical simulations on a linear system.