LGOct 19, 2023
Fine-Tuning Generative Models as an Inference Method for Robotic TasksOrr Krupnik, Elisei Shafer, Tom Jurgenson et al.
Adaptable models could greatly benefit robotic agents operating in the real world, allowing them to deal with novel and varying conditions. While approaches such as Bayesian inference are well-studied frameworks for adapting models to evidence, we build on recent advances in deep generative models which have greatly affected many areas of robotics. Harnessing modern GPU acceleration, we investigate how to quickly adapt the sample generation of neural network models to observations in robotic tasks. We propose a simple and general method that is applicable to various deep generative models and robotic environments. The key idea is to quickly fine-tune the model by fitting it to generated samples matching the observed evidence, using the cross-entropy method. We show that our method can be applied to both autoregressive models and variational autoencoders, and demonstrate its usability in object shape inference from grasping, inverse kinematics calculation, and point cloud completion.
LGFeb 2
VLM-Guided Experience ReplayElad Sharony, Tom Jurgenson, Orr Krupnik et al.
Recent advances in Large Language Models (LLMs) and Vision-Language Models (VLMs) have enabled powerful semantic and multimodal reasoning capabilities, creating new opportunities to enhance sample efficiency, high-level planning, and interpretability in reinforcement learning (RL). While prior work has integrated LLMs and VLMs into various components of RL, the replay buffer, a core component for storing and reusing experiences, remains unexplored. We propose addressing this gap by leveraging VLMs to guide the prioritization of experiences in the replay buffer. Our key idea is to use a frozen, pre-trained VLM (requiring no fine-tuning) as an automated evaluator to identify and prioritize promising sub-trajectories from the agent's experiences. Across scenarios, including game-playing and robotics, spanning both discrete and continuous domains, agents trained with our proposed prioritization method achieve 11-52% higher average success rates and improve sample efficiency by 19-45% compared to previous approaches. https://esharony.me/projects/vlm-rb/
LGMar 14, 2024
MAMBA: an Effective World Model Approach for Meta-Reinforcement LearningZohar Rimon, Tom Jurgenson, Orr Krupnik et al.
Meta-reinforcement learning (meta-RL) is a promising framework for tackling challenging domains requiring efficient exploration. Existing meta-RL algorithms are characterized by low sample efficiency, and mostly focus on low-dimensional task distributions. In parallel, model-based RL methods have been successful in solving partially observable MDPs, of which meta-RL is a special case. In this work, we leverage this success and propose a new model-based approach to meta-RL, based on elements from existing state-of-the-art model-based and meta-RL methods. We demonstrate the effectiveness of our approach on common meta-RL benchmark domains, attaining greater return with better sample efficiency (up to $15\times$) while requiring very little hyperparameter tuning. In addition, we validate our approach on a slate of more challenging, higher-dimensional domains, taking a step towards real-world generalizing agents.
LGJan 29, 2019
Multi-Agent Reinforcement Learning with Multi-Step Generative ModelsOrr Krupnik, Igor Mordatch, Aviv Tamar
We consider model-based reinforcement learning (MBRL) in 2-agent, high-fidelity continuous control problems -- an important domain for robots interacting with other agents in the same workspace. For non-trivial dynamical systems, MBRL typically suffers from accumulating errors. Several recent studies have addressed this problem by learning latent variable models for trajectory segments and optimizing over behavior in the latent space. In this work, we investigate whether this approach can be extended to 2-agent competitive and cooperative settings. The fundamental challenge is how to learn models that capture interactions between agents, yet are disentangled to allow for optimization of each agent behavior separately. We propose such models based on a disentangled variational auto-encoder, and demonstrate our approach on a simulated 2-robot manipulation task, where one robot can either help or distract the other. We show that our approach has better sample efficiency than a strong model-free RL baseline, and can learn both cooperative and adversarial behavior from the same data.