53.4ROMay 29
Trajectory Planning for Non-Communicating Mobile Robots using Inverse Optimal ControlNina Majer, Yannick Epple, Xin Ye et al.
To enable an efficient interaction of non-communicating mobile robots in collision avoidance scenarios, we present a novel combined trajectory planning and prediction algorithm. Inverse optimal control is used to estimate unknown goal states of all robots based on observed past trajectories. Each robot also takes the perspective of other robots in considering self-prediction and solves a joint prediction problem using the estimated goal states. The resulting predictions are then considered for planning. Simulation results of scenarios with 2-8 robots show that the median of the durations until all vehicles reach their goals is 9.8 % faster compared to planning with constant acceleration based estimated goal states. Moreover, the proposed approach never leads to the solver being unable to find a solution to the planning or prediction problem.
SYFeb 2, 2018
Full- & Reduced-Order State-Space Modeling of Wind Turbine Systems with Permanent-Magnet Synchronous GeneratorChristoph M. Hackl, Martin Pfeifer, Korbinian Schechner et al.
Wind energy is an integral part of nowadays energy supply and one of the fastest growing sources of electricity in the world today. Accurate models for wind energy conversion systems (WECSs) are of key interest for the analysis and control design of present and future energy systems. Existing control-oriented WECSs models are subject to unstructured simplifications, which have not been discussed in literature so far. Thus, this technical note presents are thorough derivation of a physical state-space model for permanent magnet synchronous generator WECSs. The physical model considers all dynamic effects that significantly influence the system's power output, including the switching of the power electronics. Alternatively, the model is formulated in the $(a,b,c)$- and $(d,q)$-reference frame. Secondly, a complete control and operation management system for the wind regimes II and III and the transition between the regimes is presented. The control takes practical effects such as input saturation and integral windup into account. Thirdly, by a structured model reduction procedure, two state-space models of WECS with reduced complexity are derived: a non-switching model and a non-switching reduced-order model. The validity of the models is illustrated and compared through a numerical simulation study.
22.2SYApr 28
Inverse Linear-Quadratic Gaussian Differential GamesLucas Günther, Felix Thömmes, Karl Handwerker et al.
This paper presents a method for solving the Inverse Stochastic Differential Game (ISDG) problem in finite-horizon linear-quadratic Gaussian (LQG) differential games. The objective is to recover cost function parameters of all players, as well as noise scaling parameters of the stochastic system, consistent with observed trajectories. The proposed framework combines (i) estimation of the feedback strategies, (ii) identification of the cost function parameters via a novel reformulation of the coupled Riccati differential equations, and (iii) maximum likelihood estimation of the noise scaling parameters. Simulation results demonstrate that the approach recovers parameters, yielding trajectories that closely match the observed trajectories.
15.8SYApr 16
Bridging Continuous-time LQR and Reinforcement Learning via Gradient Flow of the Bellman ErrorArmin Gießler, Albertus Johannes Malan, Sören Hohmann
In this paper, we present a novel method for computing the optimal feedback gain of the infinite-horizon Linear Quadratic Regulator (LQR) problem via an ordinary differential equation. We introduce a novel continuous-time Bellman error, derived from the Hamilton-Jacobi-Bellman (HJB) equation, which quantifies the suboptimality of stabilizing policies and is parametrized in terms of the feedback gain. We analyze its properties, including its effective domain, smoothness, coerciveness and show the existence of a unique stationary point within the stability region. Furthermore, we derive a closed-form gradient expression of the Bellman error that induces a gradient flow. This converges to the optimal feedback and generates a unique trajectory which exclusively comprises stabilizing feedback policies. Additionally, this work advances interesting connections between LQR theory and Reinforcement Learning (RL) by redefining suboptimality of the Algebraic Riccati Equation (ARE) as a Bellman error, adapting a state-independent formulation, and leveraging Lyapunov equations to overcome the infinite-horizon challenge. We validate our method in a simulation and compare it to the state of the art.
LGAug 4, 2024
Scenario-based Thermal Management Parametrization Through Deep Reinforcement LearningThomas Rudolf, Philip Muhl, Sören Hohmann et al.
The thermal system of battery electric vehicles demands advanced control. Its thermal management needs to effectively control active components across varying operating conditions. While robust control function parametrization is required, current methodologies show significant drawbacks. They consume considerable time, human effort, and extensive real-world testing. Consequently, there is a need for innovative and intelligent solutions that are capable of autonomously parametrizing embedded controllers. Addressing this issue, our paper introduces a learning-based tuning approach. We propose a methodology that benefits from automated scenario generation for increased robustness across vehicle usage scenarios. Our deep reinforcement learning agent processes the tuning task context and incorporates an image-based interpretation of embedded parameter sets. We demonstrate its applicability to a valve controller parametrization task and verify it in real-world vehicle testing. The results highlight the competitive performance to baseline methods. This novel approach contributes to the shift towards virtual development of thermal management functions, with promising potential of large-scale parameter tuning in the automotive industry.
ROSep 16, 2024
Disentangling Uncertainty for Safe Social Navigation using Deep Reinforcement LearningDaniel Flögel, Marcos Gómez Villafañe, Joshua Ransiek et al.
Autonomous mobile robots are increasingly used in pedestrian-rich environments where safe navigation and appropriate human interaction are crucial. While Deep Reinforcement Learning (DRL) enables socially integrated robot behavior, challenges persist in novel or perturbed scenarios to indicate when and why the policy is uncertain. Unknown uncertainty in decision-making can lead to collisions or human discomfort and is one reason why safe and risk-aware navigation is still an open problem. This work introduces a novel approach that integrates aleatoric, epistemic, and predictive uncertainty estimation into a DRL navigation framework for policy distribution uncertainty estimates. We, therefore, incorporate Observation-Dependent Variance (ODV) and dropout into the Proximal Policy Optimization (PPO) algorithm. For different types of perturbations, we compare the ability of deep ensembles and Monte-Carlo dropout (MC-dropout) to estimate the uncertainties of the policy. In uncertain decision-making situations, we propose to change the robot's social behavior to conservative collision avoidance. The results show improved training performance with ODV and dropout in PPO and reveal that the training scenario has an impact on the generalization. In addition, MC-dropout is more sensitive to perturbations and correlates the uncertainty type to the perturbation better. With the safe action selection, the robot can navigate in perturbed environments with fewer collisions.
4.7SYApr 16
Data-driven Linear Quadratic Integral Control: A Convex Formulation and Policy Gradient ApproachArmin Gießler, Pol Jané-Soneira, Sören Hohmann
This paper studies the data-driven synthesis of linear quadratic integral (LQI) controllers for continuous-time systems. The objective is to achieve optimal state-feedback control with integral action for reference tracking using only measured data. To this end, we derive a data-driven closed-loop parameterization of the augmented dynamics that incorporates the integral state while relying solely on input-state-output measurements of the underlying system. Based on this parameterization, a data-driven convex optimization problem is formulated whose solution yields the optimal linear quadratic regulator (LQR) feedback gain for the augmented system without explicit knowledge of the system matrices. In addition, a policy gradient flow is derived to compute the optimal controller within the space of stabilizing gains. The proposed approach enables data-driven optimal tracking control while avoiding explicit state augmentation in the data collection phase. The effectiveness of the method is demonstrated through a numerical example involving a distributed generation unit (DGU) in a DC microgrid.
41.9OCApr 16
Towards Optimal Passive Feedback Control of LTI Systems under LQR PerformanceArmin Gießler, Pol Jané-Soneira, Sören Hohmann
We study state-feedback design for continuous-time LTI systems with a control input and an external input-output pair. Our objective is to determine feedback gains that render the closed-loop system (strictly) passive with respect to the external port while minimizing the standard LQR cost in the disturbance-free case. The resulting constrained optimization problem is intractable due to bilinear matrix inequalities. We analyze the set of passivating gains, showing it is unbounded, possibly nonconvex, path-connected, and contractible. We propose an indirect approach, in which the set of passivating feedback gains is inner-approximated by a compact, convex polytope. A projected gradient flow is employed to compute a gain within this polytope that minimizes the LQR cost. Numerical examples illustrate the effectiveness of the method.
RODec 8, 2025
SINRL: Socially Integrated Navigation with Reinforcement Learning using Spiking Neural NetworksFlorian Tretter, Daniel Flögel, Alexandru Vasilache et al.
Integrating autonomous mobile robots into human environments requires human-like decision-making and energy-efficient, event-based computation. Despite progress, neuromorphic methods are rarely applied to Deep Reinforcement Learning (DRL) navigation approaches due to unstable training. We address this gap with a hybrid socially integrated DRL actor-critic approach that combines Spiking Neural Networks (SNNs) in the actor with Artificial Neural Networks (ANNs) in the critic and a neuromorphic feature extractor to capture temporal crowd dynamics and human-robot interactions. Our approach enhances social navigation performance and reduces estimated energy consumption by approximately 1.69 orders of magnitude.
19.0OCApr 30
Data-Driven Continuous-Time Linear Quadratic Regulator via Closed-Loop and Reinforcement Learning ParameterizationsArmin Gießler, Felix Thömmes, Sören Hohmann
This paper studies data-driven approaches to the continuous-time linear quadratic regulator (LQR) problem based on two existing parameterizations, namely a closed-loop (CL) parameterization from behavioral system theory and an integral reinforcement learning (IRL) parameterization. The CL parameterization characterizes the closed-loop system via a matrix that satisfies equality constraints. While this parameterization has been extensively studied for discrete-time systems, we adapt key results to the continuous-time setting and develop a policy iteration (PI) scheme, derive a data-driven continuous-time algebraic Riccati equation (CARE), and introduce an alternative convex problem formulation. The IRL parameterization utilizes off-policy data to perform policy evaluation, which is then used for PI or value iteration. Within the IRL framework, we derive a policy gradient flow and propose convex reformulations of the LQR problem. Finally, we provide a unified treatment of these parameterizations that enables a systematic understanding of existing approaches and clarifies their structural relationships.
LGJan 10, 2024
ReACT: Reinforcement Learning for Controller Parametrization using B-Spline GeometriesThomas Rudolf, Daniel Flögel, Tobias Schürmann et al.
Robust and performant controllers are essential for industrial applications. However, deriving controller parameters for complex and nonlinear systems is challenging and time-consuming. To facilitate automatic controller parametrization, this work presents a novel approach using deep reinforcement learning (DRL) with N-dimensional B-spline geometries (BSGs). We focus on the control of parameter-variant systems, a class of systems with complex behavior which depends on the operating conditions. For this system class, gain-scheduling control structures are widely used in applications across industries due to well-known design principles. Facilitating the expensive controller parametrization task regarding these control structures, we deploy an DRL agent. Based on control system observations, the agent autonomously decides how to adapt the controller parameters. We make the adaptation process more efficient by introducing BSGs to map the controller parameters which may depend on numerous operating conditions. To preprocess time-series data and extract a fixed-length feature vector, we use a long short-term memory (LSTM) neural networks. Furthermore, this work contributes actor regularizations that are relevant to real-world environments which differ from training. Accordingly, we apply dropout layer normalization to the actor and critic networks of the truncated quantile critic (TQC) algorithm. To show our approach's working principle and effectiveness, we train and evaluate the DRL agent on the parametrization task of an industrial control structure with parameter lookup tables.
ROMar 14, 2024
Socially Integrated Navigation: A Social Acting Robot with Deep Reinforcement LearningDaniel Flögel, Lars Fischer, Thomas Rudolf et al.
Mobile robots are being used on a large scale in various crowded situations and become part of our society. The socially acceptable navigation behavior of a mobile robot with individual human consideration is an essential requirement for scalable applications and human acceptance. Deep Reinforcement Learning (DRL) approaches are recently used to learn a robot's navigation policy and to model the complex interactions between robots and humans. We propose to divide existing DRL-based navigation approaches based on the robot's exhibited social behavior and distinguish between social collision avoidance with a lack of social behavior and socially aware approaches with explicit predefined social behavior. In addition, we propose a novel socially integrated navigation approach where the robot's social behavior is adaptive and emerges from the interaction with humans. The formulation of our approach is derived from a sociological definition, which states that social acting is oriented toward the acting of others. The DRL policy is trained in an environment where other agents interact socially integrated and reward the robot's behavior individually. The simulation results indicate that the proposed socially integrated navigation approach outperforms a socially aware approach in terms of ego navigation performance while significantly reducing the negative impact on all agents within the environment.
HCNov 29, 2021
Human-machine Symbiosis: A Multivariate Perspective for Physically Coupled Human-machine SystemsJairo Inga, Miriam Ruess, Jan Heinrich Robens et al.
The notion of symbiosis has been increasingly mentioned in research on physically coupled human-machine systems. Yet, a uniform specification on which aspects constitute human-machine symbiosis is missing. By combining the expertise of different disciplines, we elaborate on a multivariate perspective of symbiosis as the highest form of physically coupled human-machine systems. Four dimensions are considered: Task, interaction, performance, and experience. First, human and machine work together to accomplish a common task conceptualized on both a decision and an action level (task dimension). Second, each partner possesses an internal representation of own as well as the other partner's intentions and influence on the environment. This alignment, which is the core of the interaction, constitutes the symbiotic understanding between both partners, being the basis of a joint, highly coordinated and effective action (interaction dimension). Third, the symbiotic interaction leads to synergetic effects regarding the intention recognition and complementary strengths of the partners, resulting in a higher overall performance (performance dimension). Fourth, symbiotic systems specifically change the user's experiences, like flow, acceptance, sense of agency, and embodiment (experience dimension). This multivariate perspective is flexible and generic and is also applicable in diverse human-machine scenarios, helping to bridge barriers between different disciplines.
SYOct 26, 2020
Adaptive Optimal Trajectory Tracking Control Applied to a Large-Scale Ball-on-Plate SystemFlorian Köpf, Sean Kille, Jairo Inga et al.
While many theoretical works concerning Adaptive Dynamic Programming (ADP) have been proposed, application results are scarce. Therefore, we design an ADP-based optimal trajectory tracking controller and apply it to a large-scale ball-on-plate system. Our proposed method incorporates an approximated reference trajectory instead of using setpoint tracking and allows to automatically compensate for constant offset terms. Due to the off-policy characteristics of the algorithm, the method requires only a small amount of measured data to train the controller. Our experimental results show that this tracking mechanism significantly reduces the control cost compared to setpoint controllers. Furthermore, a comparison with a model-based optimal controller highlights the benefits of our model-free data-based ADP tracking controller, where no system model and manual tuning are required but the controller is tuned automatically using measured data.
SYMay 8, 2020
Multi-Robot Task Allocation and Scheduling Considering Cooperative Tasks and Precedence ConstraintsEsther Bischoff, Fabian Meyer, Jairo Inga et al.
In order to fully exploit the advantages inherent to cooperating heterogeneous multi-robot teams, sophisticated coordination algorithms are essential. Time-extended multi-robot task allocation approaches assign and schedule a set of tasks to a group of robots such that certain objectives are optimized and operational constraints are met. This is particularly challenging if cooperative tasks, i.e. tasks that require two or more robots to work directly together, are considered. In this paper, we present an easy-to-implement criterion to validate the feasibility, i.e. executability, of solutions to time-extended multi-robot task allocation problems with cross schedule dependencies arising from the consideration of cooperative tasks and precedence constraints. Using the introduced feasibility criterion, we propose a local improvement heuristic based on a neighborhood operator for the problem class under consideration. The initial solution is obtained by a greedy constructive heuristic. Both methods use a generalized cost structure and are therefore able to handle various objective function instances. We evaluate the proposed approach using test scenarios of different problem sizes, all comprising the complexity aspects of the regarded problem. The simulation results illustrate the improvement potential arising from the application of the local improvement heuristic.
MAOct 29, 2019
Deep Decentralized Reinforcement Learning for Cooperative ControlFlorian Köpf, Samuel Tesfazgi, Michael Flad et al.
In order to collaborate efficiently with unknown partners in cooperative control settings, adaptation of the partners based on online experience is required. The rather general and widely applicable control setting, where each cooperation partner might strive for individual goals while the control laws and objectives of the partners are unknown, entails various challenges such as the non-stationarity of the environment, the multi-agent credit assignment problem, the alter-exploration problem and the coordination problem. We propose new, modular deep decentralized Multi-Agent Reinforcement Learning mechanisms to account for these challenges. Therefore, our method uses a time-dependent prioritization of samples, incorporates a model of the system dynamics and utilizes variable, accountability-driven learning rates and simulated, artificial experiences in order to guide the learning process. The effectiveness of our method is demonstrated by means of a simulated, nonlinear cooperative control task.
SYSep 16, 2019
Adaptive Dynamic Programming for Model-free Tracking of Trajectories with Time-varying ParametersFlorian Köpf, Simon Ramsteiner, Michael Flad et al.
In order to autonomously learn to control unknown systems optimally w.r.t. an objective function, Adaptive Dynamic Programming (ADP) is well-suited to adapt controllers based on experience from interaction with the system. In recent years, many researchers focused on the tracking case, where the aim is to follow a desired trajectory. So far, ADP tracking controllers assume that the reference trajectory follows time-invariant exo-system dynamics-an assumption that does not hold for many applications. In order to overcome this limitation, we propose a new Q-function which explicitly incorporates a parametrized approximation of the reference trajectory. This allows to learn to track a general class of trajectories by means of ADP. Once our Q-function has been learned, the associated controller copes with time-varying reference trajectories without need of further training and independent of exo-system dynamics. After proposing our general model-free off-policy tracking method, we provide analysis of the important special case of linear quadratic tracking. We conclude our paper with an example which demonstrates that our new method successfully learns the optimal tracking controller and outperforms existing approaches in terms of tracking error and cost.
SYSep 9, 2019
Partner Approximating Learners (PAL): Simulation-Accelerated Learning with Explicit Partner Modeling in Multi-Agent DomainsFlorian Köpf, Alexander Nitsch, Michael Flad et al.
Mixed cooperative-competitive control scenarios such as human-machine interaction with individual goals of the interacting partners are very challenging for reinforcement learning agents. In order to contribute towards intuitive human-machine collaboration, we focus on problems in the continuous state and control domain where no explicit communication is considered and the agents do not know the others' goals or control laws but only sense their control inputs retrospectively. Our proposed framework combines a learned partner model based on online data with a reinforcement learning agent that is trained in a simulated environment including the partner model. Thus, we overcome drawbacks of independent learners and, in addition, benefit from a reduced amount of real world data required for reinforcement learning which is vital in the human-machine context. We finally analyze an example that demonstrates the merits of our proposed framework which learns fast due to the simulated environment and adapts to the continuously changing partner due to the partner approximation.
SYJun 12, 2019
Adaptive Optimal Control for Reference Tracking Independent of Exo-System DynamicsFlorian Köpf, Johannes Westermann, Michael Flad et al.
Model-free control based on the idea of Reinforcement Learning is a promising approach that has recently gained extensive attention. However, Reinforcement-Learning-based control methods solely focus on the regulation problem or learn to track a reference that is generated by a time-invariant exo-system. In the latter case, controllers are only able to track the time-invariant reference dynamics which they have been trained on and need to be re-trained each time the reference dynamics change. Consequently, these methods fail in a number of applications which obviously rely on a trajectory not being generated by an exo-system. One prominent example is autonomous driving. This paper provides for the first time an adaptive optimal control method capable to track reference trajectories not being generated by a time-invariant exo-system. The main innovation is a novel Q-function that directly incorporates a given reference trajectory on a moving horizon. This new Q-function exhibits a particular structure which allows the design of an efficient, iterative, provably convergent Reinforcement Learning algorithm that enables optimal tracking. Two real-world examples demonstrate the effectiveness of our new method.