Guoliang Fan

h-index26

17papers

178citations

Novelty57%

AI Score34

Ranked #114,903 of 194,257 authors (top 59%)#92 in MA (top 50%)

17 Papers

25.1AINov 23, 2023

Controlling Large Language Model-based Agents for Large-Scale Decision-Making: An Actor-Critic Approach

Bin Zhang, Hangyu Mao, Jingqing Ruan et al.

The remarkable progress in Large Language Models (LLMs) opens up new avenues for addressing planning and decision-making problems in Multi-Agent Systems (MAS). However, as the number of agents increases, the issues of hallucination in LLMs and coordination in MAS have become increasingly prominent. Additionally, the efficient utilization of tokens emerges as a critical consideration when employing LLMs to facilitate the interactions among a substantial number of agents. In this paper, we develop a modular framework called LLaMAC to mitigate these challenges. LLaMAC implements a value distribution encoding similar to that found in the human brain, utilizing internal and external feedback mechanisms to facilitate collaboration and iterative reasoning among its modules. Through evaluations involving system resource allocation and robot grid transportation, we demonstrate the considerable advantages afforded by our proposed approach.

9.2MAApr 20, 2022

Mingling Foresight with Imagination: Model-Based Cooperative Multi-Agent Reinforcement Learning

Zhiwei Xu, Dapeng Li, Bin Zhang et al.

Recently, model-based agents have achieved better performance than model-free ones using the same computational budget and training time in single-agent environments. However, due to the complexity of multi-agent systems, it is tough to learn the model of the environment. The significant compounding error may hinder the learning process when model-based methods are applied to multi-agent tasks. This paper proposes an implicit model-based multi-agent reinforcement learning method based on value decomposition methods. Under this method, agents can interact with the learned virtual environment and evaluate the current state value according to imagined future states in the latent space, making agents have the foresight. Our approach can be applied to any multi-agent value decomposition method. The experimental results show that our method improves the sample efficiency in different partially observable Markov decision process domains.

9.0AIMar 7, 2022

Efficient Policy Generation in Multi-Agent Systems via Hypergraph Neural Network

Bin Zhang, Yunpeng Bai, Zhiwei Xu et al.

The application of deep reinforcement learning in multi-agent systems introduces extra challenges. In a scenario with numerous agents, one of the most important concerns currently being addressed is how to develop sufficient collaboration between diverse agents. To address this problem, we consider the form of agent interaction based on neighborhood and propose a multi-agent reinforcement learning (MARL) algorithm based on the actor-critic method, which can adaptively construct the hypergraph structure representing the agent interaction and further implement effective information extraction and representation learning through hypergraph convolution networks, leading to effective cooperation. Based on different hypergraph generation methods, we present two variants: Actor Hypergraph Convolutional Critic Network (HGAC) and Actor Attention Hypergraph Critic Network (ATT-HGAC). Experiments with different settings demonstrate the advantages of our approach over other existing methods.

7.3MAJun 6, 2022

Consensus Learning for Cooperative Multi-Agent Reinforcement Learning

Zhiwei Xu, Bin Zhang, Dapeng Li et al.

Almost all multi-agent reinforcement learning algorithms without communication follow the principle of centralized training with decentralized execution. During centralized training, agents can be guided by the same signals, such as the global state. During decentralized execution, however, agents lack the shared signal. Inspired by viewpoint invariance and contrastive learning, we propose consensus learning for cooperative multi-agent reinforcement learning in this paper. Although based on local observations, different agents can infer the same consensus in discrete space. During decentralized execution, we feed the inferred consensus as an explicit input to the network of agents, thereby developing their spirit of cooperation. Our proposed method can be extended to various multi-agent reinforcement learning algorithms with small model changes. Moreover, we carry out them on some fully cooperative tasks and get convincing results.

8.0MAApr 28, 2023

From Explicit Communication to Tacit Cooperation:A Novel Paradigm for Cooperative MARL

Dapeng Li, Zhiwei Xu, Bin Zhang et al.

Centralized training with decentralized execution (CTDE) is a widely-used learning paradigm that has achieved significant success in complex tasks. However, partial observability issues and the absence of effectively shared signals between agents often limit its effectiveness in fostering cooperation. While communication can address this challenge, it simultaneously reduces the algorithm's practicality. Drawing inspiration from human team cooperative learning, we propose a novel paradigm that facilitates a gradual shift from explicit communication to tacit cooperation. In the initial training stage, we promote cooperation by sharing relevant information among agents and concurrently reconstructing this information using each agent's local trajectory. We then combine the explicitly communicated information with the reconstructed information to obtain mixed information. Throughout the training process, we progressively reduce the proportion of explicitly communicated information, facilitating a seamless transition to fully decentralized execution without communication. Experimental results in various scenarios demonstrate that the performance of our method without communication can approaches or even surpasses that of QMIX and communication-based methods.

8.0MAApr 20, 2023

Inducing Stackelberg Equilibrium through Spatio-Temporal Sequential Decision-Making in Multi-Agent Reinforcement Learning

Bin Zhang, Lijuan Li, Zhiwei Xu et al.

In multi-agent reinforcement learning (MARL), self-interested agents attempt to establish equilibrium and achieve coordination depending on game structure. However, existing MARL approaches are mostly bound by the simultaneous actions of all agents in the Markov game (MG) framework, and few works consider the formation of equilibrium strategies via asynchronous action coordination. In view of the advantages of Stackelberg equilibrium (SE) over Nash equilibrium, we construct a spatio-temporal sequential decision-making structure derived from the MG and propose an N-level policy model based on a conditional hypernetwork shared by all agents. This approach allows for asymmetric training with symmetric execution, with each agent responding optimally conditioned on the decisions made by superior agents. Agents can learn heterogeneous SE policies while still maintaining parameter sharing, which leads to reduced cost for learning and storage and enhanced scalability as the number of agents increases. Experiments demonstrate that our method effectively converges to the SE policies in repeated matrix game scenarios, and performs admirably in immensely complex settings including cooperative tasks and mixed tasks.

5.3LGMar 21, 2023

Style Miner: Find Significant and Stable Explanatory Factors in Time Series with Constrained Reinforcement Learning

Dapeng Li, Feiyang Pan, Jia He et al.

In high-dimensional time-series analysis, it is essential to have a set of key factors (namely, the style factors) that explain the change of the observed variable. For example, volatility modeling in finance relies on a set of risk factors, and climate change studies in climatology rely on a set of causal factors. The ideal low-dimensional style factors should balance significance (with high explanatory power) and stability (consistent, no significant fluctuations). However, previous supervised and unsupervised feature extraction methods can hardly address the tradeoff. In this paper, we propose Style Miner, a reinforcement learning method to generate style factors. We first formulate the problem as a Constrained Markov Decision Process with explanatory power as the return and stability as the constraint. Then, we design fine-grained immediate rewards and costs and use a Lagrangian heuristic to balance them adaptively. Experiments on real-world financial data sets show that Style Miner outperforms existing learning-based methods by a large margin and achieves a relatively 10% gain in R-squared explanatory power compared to the industry-renowned factors proposed by human experts.

3.3MAApr 25, 2023

SEA: A Spatially Explicit Architecture for Multi-Agent Reinforcement Learning

Dapeng Li, Zhiwei Xu, Bin Zhang et al.

Spatial information is essential in various fields. How to explicitly model according to the spatial location of agents is also very important for the multi-agent problem, especially when the number of agents is changing and the scale is enormous. Inspired by the point cloud task in computer vision, we propose a spatial information extraction structure for multi-agent reinforcement learning in this paper. Agents can effectively share the neighborhood and global information through a spatially encoder-decoder structure. Our method follows the centralized training with decentralized execution (CTDE) paradigm. In addition, our structure can be applied to various existing mainstream reinforcement learning algorithms with minor modifications and can deal with the problem with a variable number of agents. The experiments in several multi-agent scenarios show that the existing methods can get convincing results by adding our spatially explicit architecture.

3.3MAFeb 4, 2023

Dual Self-Awareness Value Decomposition Framework without Individual Global Max for Cooperative Multi-Agent Reinforcement Learning

Zhiwei Xu, Bin Zhang, Dapeng Li et al.

Value decomposition methods have gained popularity in the field of cooperative multi-agent reinforcement learning. However, almost all existing methods follow the principle of Individual Global Max (IGM) or its variants, which limits their problem-solving capabilities. To address this, we propose a dual self-awareness value decomposition framework, inspired by the notion of dual self-awareness in psychology, that entirely rejects the IGM premise. Each agent consists of an ego policy for action selection and an alter ego value function to solve the credit assignment problem. The value function factorization can ignore the IGM assumption by utilizing an explicit search procedure. On the basis of the above, we also suggest a novel anti-ego exploration mechanism to avoid the algorithm becoming stuck in a local optimum. As the first fully IGM-free value decomposition method, our proposed framework achieves desirable performance in various cooperative tasks.

9.0AIDec 14, 2023

Adaptive parameter sharing for multi-agent reinforcement learning

Dapeng Li, Na Lou, Bin Zhang et al.

Parameter sharing, as an important technique in multi-agent systems, can effectively solve the scalability issue in large-scale agent problems. However, the effectiveness of parameter sharing largely depends on the environment setting. When agents have different identities or tasks, naive parameter sharing makes it difficult to generate sufficiently differentiated strategies for agents. Inspired by research pertaining to the brain in biology, we propose a novel parameter sharing method. It maps each type of agent to different regions within a shared network based on their identity, resulting in distinct subnetworks. Therefore, our method can increase the diversity of strategies among different agents without introducing additional training parameters. Through experiments conducted in multiple environments, our method has shown better performance than other parameter sharing methods.

2.3AIDec 20, 2024Code

AIR: Unifying Individual and Collective Exploration in Cooperative Multi-Agent Reinforcement Learning

Guangchong Zhou, Zeren Zhang, Guoliang Fan

Exploration in cooperative multi-agent reinforcement learning (MARL) remains challenging for value-based agents due to the absence of an explicit policy. Existing approaches include individual exploration based on uncertainty towards the system and collective exploration through behavioral diversity among agents. However, the introduction of additional structures often leads to reduced training efficiency and infeasible integration of these methods. In this paper, we propose Adaptive exploration via Identity Recognition~(AIR), which consists of two adversarial components: a classifier that recognizes agent identities from their trajectories, and an action selector that adaptively adjusts the mode and degree of exploration. We theoretically prove that AIR can facilitate both individual and collective exploration during training, and experiments also demonstrate the efficiency and effectiveness of AIR across various tasks.

1.2MADec 7, 2023

Mastering Complex Coordination through Attention-based Dynamic Graph

Guangchong Zhou, Zhiwei Xu, Zeren Zhang et al.

The coordination between agents in multi-agent systems has become a popular topic in many fields. To catch the inner relationship between agents, the graph structure is combined with existing methods and improves the results. But in large-scale tasks with numerous agents, an overly complex graph would lead to a boost in computational cost and a decline in performance. Here we present DAGMIX, a novel graph-based value factorization method. Instead of a complete graph, DAGMIX generates a dynamic graph at each time step during training, on which it realizes a more interpretable and effective combining process through the attention mechanism. Experiments show that DAGMIX significantly outperforms previous SOTA methods in large-scale scenarios, as well as achieving promising results on other tasks.

3.3MAMay 13, 2023

Stackelberg Decision Transformer for Asynchronous Action Coordination in Multi-Agent Systems

Bin Zhang, Hangyu Mao, Lijuan Li et al.

Asynchronous action coordination presents a pervasive challenge in Multi-Agent Systems (MAS), which can be represented as a Stackelberg game (SG). However, the scalability of existing Multi-Agent Reinforcement Learning (MARL) methods based on SG is severely constrained by network structures or environmental limitations. To address this issue, we propose the Stackelberg Decision Transformer (STEER), a heuristic approach that resolves the difficulties of hierarchical coordination among agents. STEER efficiently manages decision-making processes in both spatial and temporal contexts by incorporating the hierarchical decision structure of SG, the modeling capability of autoregressive sequence models, and the exploratory learning methodology of MARL. Our research contributes to the development of an effective and adaptable asynchronous action coordination method that can be widely applied to various task types and environmental configurations in MAS. Experimental results demonstrate that our method can converge to Stackelberg equilibrium solutions and outperforms other existing methods in complex scenarios.

4.5AIDec 9, 2021Code

Cooperative Multi-Agent Reinforcement Learning with Hypergraph Convolution

Yunpeng Bai, Chen Gong, Bin Zhang et al.

Recent years have witnessed the great success of multi-agent systems (MAS). Value decomposition, which decomposes joint action values into individual action values, has been an important work in MAS. However, many value decomposition methods ignore the coordination among different agents, leading to the notorious "lazy agents" problem. To enhance the coordination in MAS, this paper proposes HyperGraph CoNvolution MIX (HGCN-MIX), a method that incorporates hypergraph convolution with value decomposition. HGCN-MIX models agents as well as their relationships as a hypergraph, where agents are nodes and hyperedges among nodes indicate that the corresponding agents can coordinate to achieve larger rewards. Then, it trains a hypergraph that can capture the collaborative relationships among agents. Leveraging the learned hypergraph to consider how other agents' observations and actions affect their decisions, the agents in a MAS can better coordinate. We evaluate HGCN-MIX in the StarCraft II multi-agent challenge benchmark. The experimental results demonstrate that HGCN-MIX can train joint policies that outperform or achieve a similar level of performance as the current state-of-the-art techniques. We also observe that HGCN-MIX has an even more significant improvement of performance in the scenarios with a large amount of agents. Besides, we conduct additional analysis to emphasize that when the hypergraph learns more relationships, HGCN-MIX can train stronger joint policies.

5.5LGSep 24, 2021

The $f$-Divergence Reinforcement Learning Framework

Chen Gong, Qiang He, Yunpeng Bai et al.

The framework of deep reinforcement learning (DRL) provides a powerful and widely applicable mathematical formalization for sequential decision-making. This paper present a novel DRL framework, termed \emph{$f$-Divergence Reinforcement Learning (FRL)}. In FRL, the policy evaluation and policy improvement phases are simultaneously performed by minimizing the $f$-divergence between the learning policy and sampling policy, which is distinct from conventional DRL algorithms that aim to maximize the expected cumulative rewards. We theoretically prove that minimizing such $f$-divergence can make the learning policy converge to the optimal policy. Besides, we convert the process of training agents in FRL framework to a saddle-point optimization problem with a specific $f$ function through Fenchel conjugate, which forms new methods for policy evaluation and policy improvement. Through mathematical proofs and empirical evaluation, we demonstrate that the FRL framework has two advantages: (1) policy evaluation and policy improvement processes are performed simultaneously and (2) the issues of overestimating value function are naturally alleviated. To evaluate the effectiveness of the FRL framework, we conduct experiments on Atari 2600 video games and show that agents trained in the FRL framework match or surpass the baseline DRL algorithms.

5.9MAMay 13, 2021

SIDE: State Inference for Partially Observable Cooperative Multi-Agent Reinforcement Learning

Zhiwei Xu, Yunpeng Bai, Dapeng Li et al.

As one of the solutions to the decentralized partially observable Markov decision process (Dec-POMDP) problems, the value decomposition method has achieved significant results recently. However, most value decomposition methods require the fully observable state of the environment during training, but this is not feasible in some scenarios where only incomplete and noisy observations can be obtained. Therefore, we propose a novel value decomposition framework, named State Inference for value DEcomposition (SIDE), which eliminates the need to know the global state by simultaneously seeking solutions to the two problems of optimal control and state inference. SIDE can be extended to any value decomposition method to tackle partially observable problems. By comparing with the performance of different algorithms in StarCraft II micromanagement tasks, we verified that though without accessible states, SIDE can infer the current state that contributes to the reinforcement learning process based on past local observations and even achieve superior results to many baselines in some complex scenarios.

1.7CVMar 5, 2018

A generalized parametric 3D shape representation for articulated pose estimation

Meng Ding, Guoliang Fan

We present a novel parametric 3D shape representation, Generalized sum of Gaussians (G-SoG), which is particularly suitable for pose estimation of articulated objects. Compared with the original sum-of-Gaussians (SoG), G-SoG can handle both isotropic and anisotropic Gaussians, leading to a more flexible and adaptable shape representation yet with much fewer anisotropic Gaussians involved. An articulated shape template can be developed by embedding G-SoG in a tree-structured skeleton model to represent an articulated object. We further derive a differentiable similarity function between G-SoG (the template) and SoG (observed data) that can be optimized analytically for efficient pose estimation. The experimental results on a standard human pose estimation dataset show the effectiveness and advantages of G-SoG over the original SoG as well as the promise compared with the recent algorithms that use more complicated shape models.