SYApr 29, 2016
Predicting Lane Keeping Behavior of Visually Distracted Drivers Using Inverse Suboptimal ControlFelix Schmitt, Hans-Joachim Bieg, Dietrich Manstetten et al.
Driver distraction strongly contributes to crash-risk. Therefore, assistance systems that warn the driver if her distraction poses a hazard to road safety, promise a great safety benefit. Current approaches either seek to detect critical situations using environmental sensors or estimate a driver's attention state solely from her behavior. However, this neglects that driving situation, driver deficiencies and compensation strategies altogether determine the risk of an accident. This work proposes to use inverse suboptimal control to predict these aspects in visually distracted lane keeping. In contrast to other approaches, this allows a situation-dependent assessment of the risk posed by distraction. Real traffic data of seven drivers are used for evaluation of the predictive power of our approach. For comparison, a baseline was built using established behavior models. In the evaluation our method achieves a consistently lower prediction error over speed and track-topology variations. Additionally, our approach generalizes better to driving speeds unseen in training phase.
SYJul 19, 2016
Exact Maximum Entropy Inverse Optimal Control for Modelling Human Attention Switching and ControlFelix Schmitt, Hans-Joachim Bieg, Dietrich Manstetten et al.
Maximum Causal Entropy (MCE) Inverse Optimal Control (IOC) has become an effective tool for modelling human behaviour in many control tasks. Its advantage over classic techniques for estimating human policies is the transferability of the inferred objectives: Behaviour can be predicted in variations of the control task by policy computation using a relaxed optimality criterion. However, exact policy inference is often computationally intractable in control problems with imperfect state observation. In this work, we present a model class that allows modelling human control of two tasks of which only one be perfectly observed at a time requiring attention switching. We show how efficient and exact objective and policy inference via MCE can be conducted for these control problems. Both MCE-IOC and Maximum Causal Likelihood (MCL)-IOC, a variant of the original MCE approach, as well as Direct Policy Estimation (DPE) are evaluated using simulated and real behavioural data. Prediction error and generalization over changes in the control process are both considered in the evaluation. The results show a clear advantage of both IOC methods over DPE, especially in the transfer over variation of the control process. MCE and MCL performed similar when training on a large set of simulated data, but differed significantly on small sets and real data.
ROSep 23, 2021
Hierarchies of Planning and Reinforcement Learning for Robot NavigationJan Wöhlke, Felix Schmitt, Herke van Hoof
Solving robotic navigation tasks via reinforcement learning (RL) is challenging due to their sparse reward and long decision horizon nature. However, in many navigation tasks, high-level (HL) task representations, like a rough floor plan, are available. Previous work has demonstrated efficient learning by hierarchal approaches consisting of path planning in the HL representation and using sub-goals derived from the plan to guide the RL policy in the source task. However, these approaches usually neglect the complex dynamics and sub-optimal sub-goal-reaching capabilities of the robot during planning. This work overcomes these limitations by proposing a novel hierarchical framework that utilizes a trainable planning policy for the HL representation. Thereby robot capabilities and environment conditions can be learned utilizing collected rollout data. We specifically introduce a planning policy based on value iteration with a learned transition model (VI-RL). In simulated robotic navigation tasks, VI-RL results in consistent strong improvement over vanilla RL, is on par with vanilla hierarchal RL on single layouts but more broadly applicable to multiple layouts, and is on par with trainable HL path planning baselines except for a parking task with difficult non-holonomic dynamics where it shows marked improvements.
LGApr 28, 2021
Reward (Mis)design for Autonomous DrivingW. Bradley Knox, Alessandro Allievi, Holger Banzhaf et al.
This article considers the problem of diagnosing certain common errors in reward design. Its insights are also applicable to the design of cost functions and performance metrics more generally. To diagnose common errors, we develop 8 simple sanity checks for identifying flaws in reward functions. These sanity checks are applied to reward functions from past work on reinforcement learning (RL) for autonomous driving (AD), revealing near-universal flaws in reward design for AD that might also exist pervasively across reward design for other tasks. Lastly, we explore promising directions that may aid the design of reward functions for AD in subsequent research, following a process of inquiry that can be adapted to other domains.
AIApr 13, 2016
Inverse Reinforcement Learning with Simultaneous Estimation of Rewards and DynamicsMichael Herman, Tobias Gindele, Jörg Wagner et al.
Inverse Reinforcement Learning (IRL) describes the problem of learning an unknown reward function of a Markov Decision Process (MDP) from observed behavior of an agent. Since the agent's behavior originates in its policy and MDP policies depend on both the stochastic system dynamics as well as the reward function, the solution of the inverse problem is significantly influenced by both. Current IRL approaches assume that if the transition model is unknown, additional samples from the system's dynamics are accessible, or the observed behavior provides enough samples of the system's dynamics to solve the inverse problem accurately. These assumptions are often not satisfied. To overcome this, we present a gradient-based IRL approach that simultaneously estimates the system's dynamics. By solving the combined optimization problem, our approach takes into account the bias of the demonstrations, which stems from the generating policy. The evaluation on a synthetic MDP and a transfer learning task shows improvements regarding the sample efficiency as well as the accuracy of the estimated reward functions and transition models.
SEApr 24, 2013
Software Design Principles of a DFS Tower A-CWP PrototypeFelix Schmitt, Ralf Heidger, Stephen Straub et al.
SESAR is supposed to boost the development of new operational procedures together with the supporting systems in order to modernize the pan-European air traffic management (ATM). One consequence of this development is that more and more information is presented to - and has to be processed by - air traffic control officers (ATCOs). Thus, there is a strong need for a software design concept that fosters the development of an advanced (tower) controller working position (A-CWP) that comprehensively integrates the still counting amount of information while reducing the data management workload of ATCOs. We report on our first hands-on experiences obtained during the development of an A-CWP prototype that was used in two SESAR validation sessions.