LGNov 24, 2023
Directly Attention Loss Adjusted Prioritized Experience ReplayZhuoying Chen, Huiping Li, Zhaoxu Wang
Prioritized Experience Replay (PER) enables the model to learn more about relatively important samples by artificially changing their accessed frequencies. However, this non-uniform sampling method shifts the state-action distribution that is originally used to estimate Q-value functions, which brings about the estimation deviation. In this article, an novel off policy reinforcement learning training framework called Directly Attention Loss Adjusted Prioritized Experience Replay (DALAP) is proposed, which can directly quantify the changed extent of the shifted distribution through Parallel Self-Attention network, so as to accurately compensate the error. In addition, a Priority-Encouragement mechanism is designed simultaneously to optimize the sample screening criterion, and further improve the training efficiency. In order to verify the effectiveness and generality of DALAP, we integrate it with the value-function based, the policy-gradient based and multi-agent reinforcement learning algorithm, respectively. The multiple groups of comparative experiments show that DALAP has the significant advantages of both improving the convergence rate and reducing the training variance.
60.8ITApr 15
Weighted Riemannian Optimization for Solving Quadratic Equations from Gaussian Magnitude MeasurementsJianfeng Cai, Huiping Li, Jiayi Li
This paper explores the problem of generalized phase retrieval, which involves reconstructing a length-$n$ signal $\bm{x}$ from its $m$ phaseless samples $y_k = \left|\langle \bm{a}_k,\bm{x}\rangle\right|^2$, where $k = 1,2,...,m$, and $\bm{a}_k$ are the measurement vectors. This problem can be reformulated into recovering a positive semidefinite rank-$1$ matrix $\bm{X}=\bm{x}\bm{x}^*$ from linear samples $\bm{y}=\mathcal{A}(\bm{X})\in\mathbb{R}^m$, thereby requiring us to find a rank-$1$ solution of the linear equations. We demonstrate that several existing phase retrieval algorithms, including Wirtinger Flow (WF) and the canonical Riemannian gradient descent (RGD), actually solve the least-squares fitting of this linear equation on the Riemannian manifold of rank-$1$ matrices, but utilize different metrics on this manifold. Nevertheless, these metrics only allow for a stable and far-apart-from-isometric embedding of rank-$1$ matrices to $\mathbb{R}^m$ by $\mathcal{A}$, resulting in a linear convergence with a considerably large convergence factor. To expedite the convergence, we establish a new metric on the rank-$1$ matrix manifold that facilitates the nearly isometric embedding of rank-$1$ matrices into $\mathbb{R}^m$ through $\mathcal{A}$. A RGD algorithm under this new metric, termed Weighted RGD (WRGD), is proposed to tackle the phase retrieval problem. Owing to the near isometry, we prove that our WRGD algorithm, initialized by spectral methods, can linearly converge to the underlying signal $\bm{x}$ with a small convergence factor. Empirical experiments strongly validate the efficiency and resilience of our algorithms compared to the truncated Wirtinger Flow (TWF) algorithm and the canonical RGD algorithm.
LGSep 13, 2023
Attention Loss Adjusted Prioritized Experience ReplayZhuoying Chen, Huiping Li, Rizhong Wang
Prioritized Experience Replay (PER) is a technical means of deep reinforcement learning by selecting experience samples with more knowledge quantity to improve the training rate of neural network. However, the non-uniform sampling used in PER inevitably shifts the state-action space distribution and brings the estimation error of Q-value function. In this paper, an Attention Loss Adjusted Prioritized (ALAP) Experience Replay algorithm is proposed, which integrates the improved Self-Attention network with Double-Sampling mechanism to fit the hyperparameter that can regulate the importance sampling weights to eliminate the estimation error caused by PER. In order to verify the effectiveness and generality of the algorithm, the ALAP is tested with value-function based, policy-gradient based and multi-agent reinforcement learning algorithms in OPENAI gym, and comparison studies verify the advantage and efficiency of the proposed training framework.
AINov 1, 2023
QFree: A Universal Value Function Factorization for Multi-Agent Reinforcement LearningRizhong Wang, Huiping Li, Di Cui et al.
Centralized training is widely utilized in the field of multi-agent reinforcement learning (MARL) to assure the stability of training process. Once a joint policy is obtained, it is critical to design a value function factorization method to extract optimal decentralized policies for the agents, which needs to satisfy the individual-global-max (IGM) principle. While imposing additional limitations on the IGM function class can help to meet the requirement, it comes at the cost of restricting its application to more complex multi-agent environments. In this paper, we propose QFree, a universal value function factorization method for MARL. We start by developing mathematical equivalent conditions of the IGM principle based on the advantage function, which ensures that the principle holds without any compromise, removing the conservatism of conventional methods. We then establish a more expressive mixing network architecture that can fulfill the equivalent factorization. In particular, the novel loss function is developed by considering the equivalent conditions as regularization term during policy evaluation in the MARL algorithm. Finally, the effectiveness of the proposed method is verified in a nonmonotonic matrix game scenario. Moreover, we show that QFree achieves the state-of-the-art performance in a general-purpose complex MARL benchmark environment, Starcraft Multi-Agent Challenge (SMAC).
LGJan 13, 2025
TIMRL: A Novel Meta-Reinforcement Learning Framework for Non-Stationary and Multi-Task EnvironmentsChenyang Qi, Huiping Li, Panfeng Huang
In recent years, meta-reinforcement learning (meta-RL) algorithm has been proposed to improve sample efficiency in the field of decision-making and control, enabling agents to learn new knowledge from a small number of samples. However, most research uses the Gaussian distribution to extract task representation, which is poorly adapted to tasks that change in non-stationary environment. To address this problem, we propose a novel meta-reinforcement learning method by leveraging Gaussian mixture model and the transformer network to construct task inference model. The Gaussian mixture model is utilized to extend the task representation and conduct explicit encoding of tasks. Specifically, the classification of tasks is encoded through transformer network to determine the Gaussian component corresponding to the task. By leveraging task labels, the transformer network is trained using supervised learning. We validate our method on MuJoCo benchmarks with non-stationary and multi-task environments. Experimental results demonstrate that the proposed method dramatically improves sample efficiency and accurately recognizes the classification of the tasks, while performing excellently in the environment.