Xu He

h-index13

6papers

218citations

Novelty49%

AI Score25

Ranked #167,055 of 194,257 authors (top 86%)#36,430 in LG (top 91%)

6 Papers

19.2LGFeb 16, 2021

RMIX: Learning Risk-Sensitive Policies for Cooperative Reinforcement Learning Agents

Wei Qiu, Xinrun Wang, Runsheng Yu et al.

Current value-based multi-agent reinforcement learning methods optimize individual Q values to guide individuals' behaviours via centralized training with decentralized execution (CTDE). However, such expected, i.e., risk-neutral, Q value is not sufficient even with CTDE due to the randomness of rewards and the uncertainty in environments, which causes the failure of these methods to train coordinating agents in complex environments. To address these issues, we propose RMIX, a novel cooperative MARL method with the Conditional Value at Risk (CVaR) measure over the learned distributions of individuals' Q values. Specifically, we first learn the return distributions of individuals to analytically calculate CVaR for decentralized execution. Then, to handle the temporal nature of the stochastic outcomes during executions, we propose a dynamic risk level predictor for risk level tuning. Finally, we optimize the CVaR policies with CVaR values used to estimate the target in TD error during centralized training and the CVaR values are used as auxiliary local rewards to update the local distribution via Quantile Regression loss. Empirically, we show that our method significantly outperforms state-of-the-art methods on challenging StarCraft II tasks, demonstrating enhanced coordination and improved sample efficiency.

9.0LGAug 21, 2020

Learning to Collaborate in Multi-Module Recommendation via Multi-Agent Reinforcement Learning without Communication

Xu He, Bo An, Yanghua Li et al.

With the rise of online e-commerce platforms, more and more customers prefer to shop online. To sell more products, online platforms introduce various modules to recommend items with different properties such as huge discounts. A web page often consists of different independent modules. The ranking policies of these modules are decided by different teams and optimized individually without cooperation, which might result in competition between modules. Thus, the global policy of the whole page could be sub-optimal. In this paper, we propose a novel multi-agent cooperative reinforcement learning approach with the restriction that different modules cannot communicate. Our contributions are three-fold. Firstly, inspired by a solution concept in game theory named correlated equilibrium, we design a signal network to promote cooperation of all modules by generating signals (vectors) for different modules. Secondly, an entropy-regularized version of the signal network is proposed to coordinate agents' exploration of the optimal global policy. Furthermore, experiments based on real-world e-commerce data demonstrate that our algorithm obtains superior performance over baselines.

3.3LGJun 26, 2020

Continual Learning from the Perspective of Compression

Xu He, Min Lin

Connectionist models such as neural networks suffer from catastrophic forgetting. In this work, we study this problem from the perspective of information theory and define forgetting as the increase of description lengths of previous data when they are compressed with a sequentially learned model. In addition, we show that continual learning approaches based on variational posterior approximation and generative replay can be considered as approximations to two prequential coding methods in compression, namely, the Bayesian mixture code and maximum likelihood (ML) plug-in code. We compare these approaches in terms of both compression and forgetting and empirically study the reasons that limit the performance of continual learning methods based on variational posterior approximation. To address these limitations, we propose a new continual learning method that combines ML plug-in and Bayesian mixture codes.

2.2ROMar 3, 2020

Augmented Reality on the Large Scene Based on a Markerless Registration Framework

Zhen Ma, He Xu, Yonghui Zhang et al.

In this paper, a mobile camera positioning method based on forward and inverse kinematics of robot is proposed, which can realize far point positioning of imaging position and attitude tracking in large scene enhancement. Orbit precision motion through the framework overhead cameras and combining with the ground system of sensor array object such as mobile robot platform of various sensors, realize the good 3 d image registration, solve any artifacts that is mobile robot in the large space position initialization problem, effectively implement the large space no marks augmented reality, human-computer interaction, and information summary. Finally, the feasibility and effectiveness of the method are verified by experiments.

28.3MLJun 12, 2019

Task Agnostic Continual Learning via Meta Learning

Xu He, Jakub Sygnowski, Alexandre Galashov et al.

While neural networks are powerful function approximators, they suffer from catastrophic forgetting when the data distribution is not stationary. One particular formalism that studies learning under non-stationary distribution is provided by continual learning, where the non-stationarity is imposed by a sequence of distinct tasks. Most methods in this space assume, however, the knowledge of task boundaries, and focus on alleviating catastrophic forgetting. In this work, we depart from this view and move the focus towards faster remembering -- i.e measuring how quickly the network recovers performance rather than measuring the network's performance without any adaptation. We argue that in many settings this can be more effective and that it opens the door to combining meta-learning and continual learning techniques, leveraging their complementary advantages. We propose a framework specific for the scenario where no information about task boundaries or task identity is given. It relies on a separation of concerns into what task is being solved and how the task should be solved. This framework is implemented by differentiating task specific parameters from task agnostic parameters, where the latter are optimized in a continual meta learning fashion, without access to multiple tasks at the same time. We showcase this framework in a supervised learning scenario and discuss the implication of the proposed formalism.

11.9NEJul 16, 2017

Overcoming Catastrophic Interference by Conceptors

Xu He, Herbert Jaeger

Catastrophic interference has been a major roadblock in the research of continual learning. Here we propose a variant of the back-propagation algorithm, "conceptor-aided back-prop" (CAB), in which gradients are shielded by conceptors against degradation of previously learned tasks. Conceptors have their origin in reservoir computing, where they have been previously shown to overcome catastrophic forgetting. CAB extends these results to deep feedforward networks. On the disjoint MNIST task CAB outperforms two other methods for coping with catastrophic interference that have recently been proposed in the deep learning field.