Jiamin Zhu

LG
h-index6
5papers
141citations
Novelty39%
AI Score41

5 Papers

11.1LGMay 5
Structural Equivalence and Learning Dynamics in Delayed MARL

Jules Sintes, Ana Bušić, Jiamin Zhu

We formally establish the equivalence between Observation Delay (OD) and Action Delay (AD) in cooperative partially observable multi-agent systems using observation-action histories. We show that both systems generate identical admissible joint-policy sets, and their induced state-action-observation trajectories are identical in distribution, leading to identical optimal solutions in Decentralized Partially Observable Markov Decision Processes (Dec-POMDPs). This formally generalizes existing infinite-horizon single-agent results to any-horizon partially observable cooperative multi-agent problems with decentralized policy execution, and allows any mixed-delay configuration to be reduced to a pure OD system. We further prove that in Transition-Independent MDPs (TI-MDPs), the observation-action history reduces to a tractable minimal local augmented state. However, we show through numerical experiments that although the optimal solution spaces are structurally isomorphic, the practical learning dynamics are fundamentally different. First, using the minimal local augmented state, the equivalence no longer holds when transitions are not independent. Second, operational constraints and causal credit-assignment errors in Temporal Difference (TD) algorithms induce different learning behaviors across regimes. Finally, leveraging this structural equivalence to bypass these learning challenges, we demonstrate successful multi-agent zero-shot policy transfer from OD to AD, paving the way for unified, efficient solution methods in complex delayed systems.

LGJan 23, 2025
WFCRL: A Multi-Agent Reinforcement Learning Benchmark for Wind Farm Control

Claire Bizon Monroc, Ana Bušić, Donatien Dubuc et al.

The wind farm control problem is challenging, since conventional model-based control strategies require tractable models of complex aerodynamical interactions between the turbines and suffer from the curse of dimension when the number of turbines increases. Recently, model-free and multi-agent reinforcement learning approaches have been used to address this challenge. In this article, we introduce WFCRL (Wind Farm Control with Reinforcement Learning), the first open suite of multi-agent reinforcement learning environments for the wind farm control problem. WFCRL frames a cooperative Multi-Agent Reinforcement Learning (MARL) problem: each turbine is an agent and can learn to adjust its yaw, pitch or torque to maximize the common objective (e.g. the total power production of the farm). WFCRL also offers turbine load observations that will allow to optimize the farm performance while limiting turbine structural damages. Interfaces with two state-of-the-art farm simulators are implemented in WFCRL: a static simulator (FLORIS) and a dynamic simulator (FAST.Farm). For each simulator, $10$ wind layouts are provided, including $5$ real wind farms. Two state-of-the-art online MARL algorithms are implemented to illustrate the scaling challenges. As learning online on FAST.Farm is highly time-consuming, WFCRL offers the possibility of designing transfer learning strategies from FLORIS to FAST.Farm.

LGJun 7, 2024
Reinforcement Learning and Regret Bounds for Admission Control

Lucas Weber, Ana Bušić, Jiamin Zhu

The expected regret of any reinforcement learning algorithm is lower bounded by $Ω\left(\sqrt{DXAT}\right)$ for undiscounted returns, where $D$ is the diameter of the Markov decision process, $X$ the size of the state space, $A$ the size of the action space and $T$ the number of time steps. However, this lower bound is general. A smaller regret can be obtained by taking into account some specific knowledge of the problem structure. In this article, we consider an admission control problem to an $M/M/c/S$ queue with $m$ job classes and class-dependent rewards and holding costs. Queuing systems often have a diameter that is exponential in the buffer size $S$, making the previous lower bound prohibitive for any practical use. We propose an algorithm inspired by UCRL2, and use the structure of the problem to upper bound the expected total regret by $O(S\log T + \sqrt{mT \log T})$ in the finite server case. In the infinite server case, we prove that the dependence of the regret on $S$ disappears.

CVOct 10, 2021
Deep learning-based person re-identification methods: A survey and outlook of recent works

Zhangqiang Ming, Min Zhu, Xiangkun Wang et al.

In recent years, with the increasing demand for public safety and the rapid development of intelligent surveillance networks, person re-identification (Re-ID) has become one of the hot research topics in the computer vision field. The main research goal of person Re-ID is to retrieve persons with the same identity from different cameras. However, traditional person Re-ID methods require manual marking of person targets, which consumes a lot of labor cost. With the widespread application of deep neural networks, many deep learning-based person Re-ID methods have emerged. Therefore, this paper is to facilitate researchers to understand the latest research results and the future trends in the field. Firstly, we summarize the studies of several recently published person Re-ID surveys and complement the latest research methods to systematically classify deep learning-based person Re-ID methods. Secondly, we propose a multi-dimensional taxonomy that classifies current deep learning-based person Re-ID methods into four categories according to metric and representation learning, including methods for deep metric learning, local feature learning, generative adversarial learning and sequence feature learning. Furthermore, we subdivide the above four categories according to their methodologies and motivations, discussing the advantages and limitations of part subcategories. Finally, we discuss some challenges and possible research directions for person Re-ID.

QUANT-PHAug 9, 2019
Composable and Finite Computational Security of Quantum Message Transmission

Fabio Banfi, Ueli Maurer, Christopher Portmann et al.

Recent research in quantum cryptography has led to the development of schemes that encrypt and authenticate quantum messages with computational security. The security definitions used so far in the literature are asymptotic, game-based, and not known to be composable. We show how to define finite, composable, computational security for secure quantum message transmission. The new definitions do not involve any games or oracles, they are directly operational: a scheme is secure if it transforms an insecure channel and a shared key into an ideal secure channel from Alice to Bob, i.e., one which only allows Eve to block messages and learn their size, but not change them or read them. By modifying the ideal channel to provide Eve with more or less capabilities, one gets an array of different security notions. By design these transformations are composable, resulting in composable security. Crucially, the new definitions are finite. Security does not rely on the asymptotic hardness of a computational problem. Instead, one proves a finite reduction: if an adversary can distinguish the constructed (real) channel from the ideal one (for some fixed security parameters), then she can solve a finite instance of some computational problem. Such a finite statement is needed to make security claims about concrete implementations. We then prove that (slightly modified versions of) protocols proposed in the literature satisfy these composable definitions. And finally, we study the relations between some game-based definitions and our composable ones. In particular, we look at notions of quantum authenticated encryption and QCCA2, and show that they suffer from the same issues as their classical counterparts: they exclude certain protocols which are arguably secure.