Safa Alver

LG
7papers
53citations
Novelty36%
AI Score22

7 Papers

LGJun 16, 2022
A Look at Value-Based Decision-Time vs. Background Planning Methods Across Different Settings

Safa Alver, Doina Precup

In model-based reinforcement learning (RL), an agent can leverage a learned model to improve its way of behaving in different ways. Two of the prevalent ways to do this are through decision-time and background planning methods. In this study, we are interested in understanding how the value-based versions of these two planning methods will compare against each other across different settings. Towards this goal, we first consider the simplest instantiations of value-based decision-time and background planning methods and provide theoretical results on which one will perform better in the regular RL and transfer learning settings. Then, we consider the modern instantiations of them and provide hypotheses on which one will perform better in the same settings. Finally, we perform illustrative experiments to validate these theoretical results and hypotheses. Overall, our findings suggest that even though value-based versions of the two planning methods perform on par in their simplest instantiations, the modern instantiations of value-based decision-time planning methods can perform on par or better than the modern instantiations of value-based background planning methods in both the regular RL and transfer learning settings.

LGJan 24, 2023
Minimal Value-Equivalent Partial Models for Scalable and Robust Planning in Lifelong Reinforcement Learning

Safa Alver, Doina Precup

Learning models of the environment from pure interaction is often considered an essential component of building lifelong reinforcement learning agents. However, the common practice in model-based reinforcement learning is to learn models that model every aspect of the agent's environment, regardless of whether they are important in coming up with optimal decisions or not. In this paper, we argue that such models are not particularly well-suited for performing scalable and robust planning in lifelong reinforcement learning scenarios and we propose new kinds of models that only model the relevant aspects of the environment, which we call "minimal value-equivalent partial models". After providing a formal definition for these models, we provide theoretical results demonstrating the scalability advantages of performing planning with such models and then perform experiments to empirically illustrate our theoretical results. Then, we provide some useful heuristics on how to learn these kinds of models with deep learning architectures and empirically demonstrate that models learned in such a way can allow for performing planning that is robust to distribution shifts and compounding model errors. Overall, both our theoretical and empirical results suggest that minimal value-equivalent partial models can provide significant benefits to performing scalable and robust planning in lifelong reinforcement learning scenarios.

AISep 30, 2023
Consciousness-Inspired Spatio-Temporal Abstractions for Better Generalization in Reinforcement Learning

Mingde Zhao, Safa Alver, Harm van Seijen et al.

Inspired by human conscious planning, we propose Skipper, a model-based reinforcement learning framework utilizing spatio-temporal abstractions to generalize better in novel situations. It automatically decomposes the given task into smaller, more manageable subtasks, and thus enables sparse decision-making and focused computation on the relevant parts of the environment. The decomposition relies on the extraction of an abstracted proxy problem represented as a directed graph, in which vertices and edges are learned end-to-end from hindsight. Our theoretical analyses provide performance guarantees under appropriate assumptions and establish where our approach is expected to be helpful. Generalization-focused experiments validate Skipper's significant advantage in zero-shot generalization, compared to some existing state-of-the-art hierarchical planning methods.

LGDec 30, 2021
Constructing a Good Behavior Basis for Transfer using Generalized Policy Updates

Safa Alver, Doina Precup

We study the problem of learning a good set of policies, so that when combined together, they can solve a wide variety of unseen reinforcement learning tasks with no or very little new data. Specifically, we consider the framework of generalized policy evaluation and improvement, in which the rewards for all tasks of interest are assumed to be expressible as a linear combination of a fixed set of features. We show theoretically that, under certain assumptions, having access to a specific set of diverse policies, which we call a set of independent policies, can allow for instantaneously achieving high-level performance on all possible downstream tasks which are typically more complex than the ones on which the agent was trained. Based on this theoretical analysis, we propose a simple algorithm that iteratively constructs this set of policies. In addition to empirically validating our theoretical results, we compare our approach with recently proposed diverse policy set construction methods and show that, while others fail, our approach is able to build a behavior basis that enables instantaneous transfer to all possible downstream tasks. We also show empirically that having access to a set of independent policies can better bootstrap the learning process on downstream tasks where the new reward function cannot be described as a linear combination of the features. Finally, we demonstrate how this policy set can be useful in a lifelong reinforcement learning setting.

LGApr 29, 2021
What is Going on Inside Recurrent Meta Reinforcement Learning Agents?

Safa Alver, Doina Precup

Recurrent meta reinforcement learning (meta-RL) agents are agents that employ a recurrent neural network (RNN) for the purpose of "learning a learning algorithm". After being trained on a pre-specified task distribution, the learned weights of the agent's RNN are said to implement an efficient learning algorithm through their activity dynamics, which allows the agent to quickly solve new tasks sampled from the same distribution. However, due to the black-box nature of these agents, the way in which they work is not yet fully understood. In this study, we shed light on the internal working mechanisms of these agents by reformulating the meta-RL problem using the Partially Observable Markov Decision Process (POMDP) framework. We hypothesize that the learned activity dynamics is acting as belief states for such agents. Several illustrative experiments suggest that this hypothesis is true, and that recurrent meta-RL agents can be viewed as agents that learn to act optimally in partially observable environments consisting of multiple related tasks. This view helps in understanding their failure cases and some interesting model-based results reported in the literature.

LGJun 12, 2020
A Brief Look at Generalization in Visual Meta-Reinforcement Learning

Safa Alver, Doina Precup

Due to the realization that deep reinforcement learning algorithms trained on high-dimensional tasks can strongly overfit to their training environments, there have been several studies that investigated the generalization performance of these algorithms. However, there has been no similar study that evaluated the generalization performance of algorithms that were specifically designed for generalization, i.e. meta-reinforcement learning algorithms. In this paper, we assess the generalization performance of these algorithms by leveraging high-dimensional, procedurally generated environments. We find that these algorithms can display strong overfitting when they are evaluated on challenging tasks. We also observe that scalability to high-dimensional tasks with sparse rewards remains a significant problem among many of the current meta-reinforcement learning algorithms. With these results, we highlight the need for developing meta-reinforcement learning algorithms that can both generalize and scale.

CVAug 10, 2019
Attentive Deep Regression Networks for Real-Time Visual Face Tracking in Video Surveillance

Safa Alver, Ugur Halici

Visual face tracking is one of the most important tasks in video surveillance systems. However, due to the variations in pose, scale, expression, and illumination it is considered to be a difficult task. Recent studies show that deep learning methods have a significant potential in object tracking tasks and adaptive feature selection methods can boost their performance. Motivated by these, we propose an end-to-end attentive deep learning based tracker, that is build on top of the state-of-the-art GOTURN tracker, for the task of real-time visual face tracking in video surveillance. Our method outperforms the state-of-the-art GOTURN and IVT trackers by very large margins and it achieves speeds that are very far beyond the requirements of real-time tracking. Additionally, to overcome the scarce data problem in visual face tracking, we also provide bounding box annotations for the G1 and G2 sets of ChokePoint dataset and make it suitable for further studies in face tracking under surveillance conditions.