Hassam Ullah Sheikh

h-index4

8papers

84citations

Novelty43%

AI Score24

Ranked #171,731 of 194,257 authors (top 88%)#37,287 in LG (top 93%)

8 Papers

7.8LGJan 31, 2022Code

DNS: Determinantal Point Process Based Neural Network Sampler for Ensemble Reinforcement Learning

Hassam Sheikh, Kizza Frisbee, Mariano Phielipp

Application of ensemble of neural networks is becoming an imminent tool for advancing the state-of-the-art in deep reinforcement learning algorithms. However, training these large numbers of neural networks in the ensemble has an exceedingly high computation cost which may become a hindrance in training large-scale systems. In this paper, we propose DNS: a Determinantal Point Process based Neural Network Sampler that specifically uses k-dpp to sample a subset of neural networks for backpropagation at every training step thus significantly reducing the training time and computation cost. We integrated DNS in REDQ for continuous control tasks and evaluated on MuJoCo environments. Our experiments show that DNS augmented REDQ outperforms baseline REDQ in terms of average cumulative reward and achieves this using less than 50% computation when measured in FLOPS.

6.1AIJun 15, 2021

Minimizing Communication while Maximizing Performance in Multi-Agent Reinforcement Learning

Varun Kumar Vijay, Hassam Sheikh, Somdeb Majumdar et al.

Inter-agent communication can significantly increase performance in multi-agent tasks that require co-ordination to achieve a shared goal. Prior work has shown that it is possible to learn inter-agent communication protocols using multi-agent reinforcement learning and message-passing network architectures. However, these models use an unconstrained broadcast communication model, in which an agent communicates with all other agents at every step, even when the task does not require it. In real-world applications, where communication may be limited by system constraints like bandwidth, power and network capacity, one might need to reduce the number of messages that are sent. In this work, we explore a simple method of minimizing communication while maximizing performance in multi-task learning: simultaneously optimizing a task-specific objective and a communication penalty. We show that the objectives can be optimized using Reinforce and the Gumbel-Softmax reparameterization. We introduce two techniques to stabilize training: 50% training and message forwarding. Training with the communication penalty on only 50% of the episodes prevents our models from turning off their outgoing messages. Second, repeating messages received previously helps models retain information, and further improves performance. With these techniques, we show that we can reduce communication by 75% with no loss of performance.

5.0LGOct 8, 2020

Learning Intrinsic Symbolic Rewards in Reinforcement Learning

Hassam Sheikh, Shauharda Khadka, Santiago Miret et al.

Learning effective policies for sparse objectives is a key challenge in Deep Reinforcement Learning (RL). A common approach is to design task-related dense rewards to improve task learnability. While such rewards are easily interpreted, they rely on heuristics and domain expertise. Alternate approaches that train neural networks to discover dense surrogate rewards avoid heuristics, but are high-dimensional, black-box solutions offering little interpretability. In this paper, we present a method that discovers dense rewards in the form of low-dimensional symbolic trees - thus making them more tractable for analysis. The trees use simple functional operators to map an agent's observations to a scalar reward, which then supervises the policy gradient learning of a neural network policy. We test our method on continuous action spaces in Mujoco and discrete action spaces in Atari and Pygame environments. We show that the discovered dense rewards are an effective signal for an RL policy to solve the benchmark tasks. Notably, we significantly outperform a widely used, contemporary neural-network based reward-discovery algorithm in all environments considered.

5.8LGJun 24, 2020

Preventing Value Function Collapse in Ensemble {Q}-Learning by Maximizing Representation Diversity

Hassam Ullah Sheikh, Ladislau Bölöni

The classic DQN algorithm is limited by the overestimation bias of the learned Q-function. Subsequent algorithms have proposed techniques to reduce this problem, without fully eliminating it. Recently, the Maxmin and Ensemble Q-learning algorithms have used different estimates provided by the ensembles of learners to reduce the overestimation bias. Unfortunately, these learners can converge to the same point in the parametric or representation space, falling back to the classic single neural network DQN. In this paper, we describe a regularization technique to maximize ensemble diversity in these algorithms. We propose and compare five regularization functions inspired from economics theory and consensus optimization. We show that the regularized approach significantly outperforms the Maxmin and Ensemble Q-learning algorithms as well as non-ensemble baselines.

2.9CRJun 4, 2020

Automatic Feature Extraction, Categorization and Detection of Malicious Code in Android Applications

Muhammad Zuhair Qadir, Atif Nisar Jilani, Hassam Ullah Sheikh

Since Android has become a popular software platform for mobile devices recently; they offer almost the same functionality as personal computers. Malwares have also become a big concern. As the number of new Android applications tends to be rapidly increased in the near future, there is a need for automatic malware detection quickly and efficiently. In this paper, we define a simple static analysis approach to first extract the features of the android application based on intents and categories the application into a known major category and later on mapping it with the permissions requested by the application and also comparing it with the most obvious intents of category. As a result, getting to know which apps are using features which they are not supposed to use or they do not need.

6.6MAMar 24, 2020

Multi-Agent Reinforcement Learning for Problems with Combined Individual and Team Reward

Hassam Ullah Sheikh, Ladislau Bölöni

Many cooperative multi-agent problems require agents to learn individual tasks while contributing to the collective success of the group. This is a challenging task for current state-of-the-art multi-agent reinforcement algorithms that are designed to either maximize the global reward of the team or the individual local rewards. The problem is exacerbated when either of the rewards is sparse leading to unstable learning. To address this problem, we present Decomposed Multi-Agent Deep Deterministic Policy Gradient (DE-MADDPG): a novel cooperative multi-agent reinforcement learning framework that simultaneously learns to maximize the global and local rewards. We evaluate our solution on the challenging defensive escort team problem and show that our solution achieves a significantly better and more stable performance than the direct adaptation of the MADDPG algorithm.

2.3MAAug 24, 2019

Universal Policies to Learn Them All

Hassam Ullah Sheikh, Ladislau Bölöni

We explore a collaborative and cooperative multi-agent reinforcement learning setting where a team of reinforcement learning agents attempt to solve a single cooperative task in a multi-scenario setting. We propose a novel multi-agent reinforcement learning algorithm inspired by universal value function approximators that not only generalizes over state space but also over a set of different scenarios. Additionally, to prove our claim, we are introducing a challenging 2D multi-agent urban security environment where the learning agents are trying to protect a person from nearby bystanders in a variety of scenarios. Our study shows that state-of-the-art multi-agent reinforcement learning algorithms fail to generalize a single task over multiple scenarios while our proposed solution works equally well as scenario-dependent policies.

1.2MAJan 28, 2019

Designing a Multi-Objective Reward Function for Creating Teams of Robotic Bodyguards Using Deep Reinforcement Learning

Hassam Ullah Sheikh, Ladislau Bölöni

We are considering a scenario where a team of bodyguard robots provides physical protection to a VIP in a crowded public space. We use deep reinforcement learning to learn the policy to be followed by the robots. As the robot bodyguards need to follow several difficult-to-reconcile goals, we study several primitive and composite reward functions and their impact on the overall behavior of the robotic bodyguards.