Karam Daaboul

LG
4papers
36citations
Novelty59%
AI Score42

4 Papers

LGMay 20
CIG: Exploration via Conditional Information Gain

Tim Joseph, Marcus Fechner, Philipp Stegmaier et al.

Intrinsic rewards for exploration in reinforcement learning condition on different contexts: lifelong rewards score each transition against accumulated experience but ignore within-rollout redundancy; episodic rewards penalize intra-trajectory repetition but discard lifetime progress. Hybrid methods combine both signals through heuristic weights or require Gaussian-process dynamics that do not scale beyond low-dimensional state spaces. Trajectory-level information gain decomposes into per-step terms that condition on the replay buffer and rollout prefix simultaneously, but remains intractable for deep models. We derive the Conditional Information Gain (CIG) reward as a tractable surrogate: a log-determinant objective over an ensemble disagreement kernel whose Cholesky factorization yields causal per-step rewards that retain both conditioning sets while scaling to high-dimensional state spaces. We instantiate CIG in a model-based setting, where rollouts are short and within-rollout corrections remain largely unexplored. Across twelve tasks spanning discrete (MiniGrid) and continuous control (OGBench), in both clean and stochastic-distractor settings, CIG outperforms or matches prior exploration methods while remaining robust to stochastic distractors.

LGJun 20, 2024
Constrained Meta Agnostic Reinforcement Learning

Karam Daaboul, Florian Kuhm, Tim Joseph et al.

Meta-Reinforcement Learning (Meta-RL) aims to acquire meta-knowledge for quick adaptation to diverse tasks. However, applying these policies in real-world environments presents a significant challenge in balancing rapid adaptability with adherence to environmental constraints. Our novel approach, Constraint Model Agnostic Meta Learning (C-MAML), merges meta learning with constrained optimization to address this challenge. C-MAML enables rapid and efficient task adaptation by incorporating task-specific constraints directly into its meta-algorithm framework during the training phase. This fusion results in safer initial parameters for learning new tasks. We demonstrate the effectiveness of C-MAML in simulated locomotion with wheeled robot tasks of varying complexity, highlighting its practicality and robustness in dynamic environments.

LGApr 14, 2021
Safe Continuous Control with Constrained Model-Based Policy Optimization

Moritz A. Zanger, Karam Daaboul, J. Marius Zöllner

The applicability of reinforcement learning (RL) algorithms in real-world domains often requires adherence to safety constraints, a need difficult to address given the asymptotic nature of the classic RL optimization objective. In contrast to the traditional RL objective, safe exploration considers the maximization of expected returns under safety constraints expressed in expected cost returns. We introduce a model-based safe exploration algorithm for constrained high-dimensional control to address the often prohibitively high sample complexity of model-free safe exploration algorithms. Further, we provide theoretical and empirical analyses regarding the implications of model-usage on constrained policy optimization problems and introduce a practical algorithm that accelerates policy search with model-generated data. The need for accurate estimates of a policy's constraint satisfaction is in conflict with accumulating model-errors. We address this issue by quantifying model-uncertainty as the expected Kullback-Leibler divergence between predictions of an ensemble of probabilistic dynamics models and constrain this error-measure, resulting in an adaptive resampling scheme and dynamically limited rollout horizons. We evaluate this approach on several simulated constrained robot locomotion tasks with high-dimensional action- and state-spaces. Our empirical studies find that our algorithm reaches model-free performances with a 10-20 fold reduction of training samples while maintaining approximate constraint satisfaction levels of model-free methods.

LGFeb 12, 2021
Generalizing Decision Making for Automated Driving with an Invariant Environment Representation using Deep Reinforcement Learning

Karl Kurzer, Philip Schörner, Alexander Albers et al.

Data driven approaches for decision making applied to automated driving require appropriate generalization strategies, to ensure applicability to the world's variability. Current approaches either do not generalize well beyond the training data or are not capable to consider a variable number of traffic participants. Therefore we propose an invariant environment representation from the perspective of the ego vehicle. The representation encodes all necessary information for safe decision making. To assess the generalization capabilities of the novel environment representation, we train our agents on a small subset of scenarios and evaluate on the entire diverse set of scenarios. Here we show that the agents are capable to generalize successfully to unseen scenarios, due to the abstraction. In addition we present a simple occlusion model that enables our agents to navigate intersections with occlusions without a significant change in performance.