Marco Caccamo

LG
h-index54
37papers
629citations
Novelty51%
AI Score57

37 Papers

LGAug 28, 2023Code
Edge Generation Scheduling for DAG Tasks Using Deep Reinforcement Learning

Binqi Sun, Mirco Theile, Ziyuan Qin et al.

Directed acyclic graph (DAG) tasks are currently adopted in the real-time domain to model complex applications from the automotive, avionics, and industrial domains that implement their functionalities through chains of intercommunicating tasks. This paper studies the problem of scheduling real-time DAG tasks by presenting a novel schedulability test based on the concept of trivial schedulability. Using this schedulability test, we propose a new DAG scheduling framework (edge generation scheduling -- EGS) that attempts to minimize the DAG width by iteratively generating edges while guaranteeing the deadline constraint. We study how to efficiently solve the problem of generating edges by developing a deep reinforcement learning algorithm combined with a graph representation neural network to learn an efficient edge generation policy for EGS. We evaluate the effectiveness of the proposed algorithm by comparing it with state-of-the-art DAG scheduling heuristics and an optimal mixed-integer linear programming baseline. Experimental results show that the proposed algorithm outperforms the state-of-the-art by requiring fewer processors to schedule the same DAG tasks. The code is available at https://github.com/binqi-sun/egs.

ROAug 30, 2022Code
Verifiable Obstacle Detection

Ayoosh Bansal, Hunmin Kim, Simon Yu et al.

Perception of obstacles remains a critical safety concern for autonomous vehicles. Real-world collisions have shown that the autonomy faults leading to fatal collisions originate from obstacle existence detection. Open source autonomous driving implementations show a perception pipeline with complex interdependent Deep Neural Networks. These networks are not fully verifiable, making them unsuitable for safety-critical tasks. In this work, we present a safety verification of an existing LiDAR based classical obstacle detection algorithm. We establish strict bounds on the capabilities of this obstacle detection algorithm. Given safety standards, such bounds allow for determining LiDAR sensor properties that would reliably satisfy the standards. Such analysis has as yet been unattainable for neural network based perception systems. We provide a rigorous analysis of the obstacle detection system with empirical results based on real-world sensor data.

ROSep 6, 2023
Learning to Recharge: UAV Coverage Path Planning through Deep Reinforcement Learning

Mirco Theile, Harald Bayerlein, Marco Caccamo et al.

Coverage path planning (CPP) is a critical problem in robotics, where the goal is to find an efficient path that covers every point in an area of interest. This work addresses the power-constrained CPP problem with recharge for battery-limited unmanned aerial vehicles (UAVs). In this problem, a notable challenge emerges from integrating recharge journeys into the overall coverage strategy, highlighting the intricate task of making strategic, long-term decisions. We propose a novel proximal policy optimization (PPO)-based deep reinforcement learning (DRL) approach with map-based observations, utilizing action masking and discount factor scheduling to optimize coverage trajectories over the entire mission horizon. We further provide the agent with a position history to handle emergent state loops caused by the recharge capability. Our approach outperforms a baseline heuristic, generalizes to different target zones and maps, with limited generalization to unseen maps. We offer valuable insights into DRL algorithm design for long-horizon problems and provide a publicly available software framework for the CPP problem.

LGJun 3, 2023
Model-aided Federated Reinforcement Learning for Multi-UAV Trajectory Planning in IoT Networks

Jichao Chen, Omid Esrafilian, Harald Bayerlein et al.

Deploying teams of unmanned aerial vehicles (UAVs) to harvest data from distributed Internet of Things (IoT) devices requires efficient trajectory planning and coordination algorithms. Multi-agent reinforcement learning (MARL) has emerged as a solution, but requires extensive and costly real-world training data. To tackle this challenge, we propose a novel model-aided federated MARL algorithm to coordinate multiple UAVs on a data harvesting mission with only limited knowledge about the environment. The proposed algorithm alternates between building an environment simulation model from real-world measurements, specifically learning the radio channel characteristics and estimating unknown IoT device positions, and federated QMIX training in the simulated environment. Each UAV agent trains a local QMIX model in its simulated environment and continuously consolidates it through federated learning with other agents, accelerating the learning process. A performance comparison with standard MARL algorithms demonstrates that our proposed model-aided FedQMIX algorithm reduces the need for real-world training experiences by around three magnitudes while attaining similar data collection performance.

SYDec 9, 2018
Software Fault Tolerance for Cyber-Physical Systems via Full System Restart

Pushpak Jagtap, Fardin Abdi, Matthias Rungger et al.

The paper addresses the issue of reliability of complex embedded control systems in the safety-critical environment. In this paper, we propose a novel approach to design controller that (i) guarantees the safety of nonlinear physical systems, (ii) enables safe system restart during runtime, and (iii) allows the use of complex, unverified controllers (e.g., neural networks) that drive the physical systems towards complex specifications. We use abstraction-based controller synthesis approach to design a formally verified controller that provides application and system-level fault tolerance along with safety guarantee. Moreover, our approach is implementable using commercial-off-the-shelf (COTS) processing unit. To demonstrate the efficacy of our solution and to verify the safety of the system under various types of faults injected in applications and in the underlying real-time operating system (RTOS), we implemented the proposed controller for the inverted pendulum and three degree-of-freedom (3-DOF) helicopter.

CVAug 30, 2022
6IMPOSE: Bridging the Reality Gap in 6D Pose Estimation for Robotic Grasping

Hongpeng Cao, Lukas Dirnberger, Daniele Bernardini et al.

6D pose recognition has been a crucial factor in the success of robotic grasping, and recent deep learning based approaches have achieved remarkable results on benchmarks. However, their generalization capabilities in real-world applications remain unclear. To overcome this gap, we introduce 6IMPOSE, a novel framework for sim-to-real data generation and 6D pose estimation. 6IMPOSE consists of four modules: First, a data generation pipeline that employs the 3D software suite Blender to create synthetic RGBD image datasets with 6D pose annotations. Second, an annotated RGBD dataset of five household objects generated using the proposed pipeline. Third, a real-time two-stage 6D pose estimation approach that integrates the object detector YOLO-V4 and a streamlined, real-time version of the 6D pose estimation algorithm PVN3D optimized for time-sensitive robotics applications. Fourth, a codebase designed to facilitate the integration of the vision system into a robotic grasping experiment. Our approach demonstrates the efficient generation of large amounts of photo-realistic RGBD images and the successful transfer of the trained inference model to robotic grasping experiments, achieving an overall success rate of 87% in grasping five different household objects from cluttered backgrounds under varying lighting conditions. This is made possible by the fine-tuning of data generation and domain randomization techniques, and the optimization of the inference pipeline, overcoming the generalization and performance shortcomings of the original PVN3D algorithm. Finally, we make the code, synthetic dataset, and all the pretrained models available on Github.

ARApr 14
Tensor Memory Engine: On-the-fly Data Reorganization for Ideal Locality

Denis Hoornaert, Cole Strickler, Manos Athanassoulis et al. · harvard

The shift to data-intensive processing from the cloud to the edge has introduced new challenges and expectations for the next generation of intelligent computing systems. As the memory wall continues to grow, modern systems can only meet these performance expectations by displaying data access patterns that exhibit ideal layouts in memory and ideal spatiotemporal locality in caches. However, only a few data-intensive applications are characterized by ideal locality. Instead, most applications exhibit either (i) poor locality when naively implemented and must undergo costly redesigns and tuning or (ii) inflated memory footprint to offer proper locality. To address the aforementioned challenges, we propose a hardware/software co-designed approach that can be implemented on commercially available SoC/FPGA platforms. Our approach seamlessly inserts in the CPUs' data path a Tensor Memory Engine that provides data with an ideal cache locality to running applications by (i) accessing the memory on behalf of the CPUs and (ii) composing a re-organized view of the memory layout. Unlike in- and near-memory computing approaches, it sets itself apart by clearly decoupling computing and memory accesses; computation is still performed on CPUs while the data re-organization is delegated to the Tensor Memory Engine.

ROFeb 14, 2023
Residual Policy Learning for Vehicle Control of Autonomous Racing Cars

Raphael Trumpp, Denis Hoornaert, Marco Caccamo

The development of vehicle controllers for autonomous racing is challenging because racing cars operate at their physical driving limit. Prompted by the demand for improved performance, autonomous racing research has seen the proliferation of machine learning-based controllers. While these approaches show competitive performance, their practical applicability is often limited. Residual policy learning promises to mitigate this drawback by combining classical controllers with learned residual controllers. The critical advantage of residual controllers is their high adaptability parallel to the classical controller's stable behavior. We propose a residual vehicle controller for autonomous racing cars that learns to amend a classical controller for the path-following of racing lines. In an extensive study, performance gains of our approach are evaluated for a simulated car of the F1TENTH autonomous racing series. The evaluation for twelve replicated real-world racetracks shows that the residual controller reduces lap times by an average of 4.55 % compared to a classical controller and even enables lap time gains on unknown racetracks.

LGMar 4, 2022
Cloud-Edge Training Architecture for Sim-to-Real Deep Reinforcement Learning

Hongpeng Cao, Mirco Theile, Federico G. Wyrwal et al.

Deep reinforcement learning (DRL) is a promising approach to solve complex control tasks by learning policies through interactions with the environment. However, the training of DRL policies requires large amounts of training experiences, making it impractical to learn the policy directly on physical systems. Sim-to-real approaches leverage simulations to pretrain DRL policies and then deploy them in the real world. Unfortunately, the direct real-world deployment of pretrained policies usually suffers from performance deterioration due to the different dynamics, known as the reality gap. Recent sim-to-real methods, such as domain randomization and domain adaptation, focus on improving the robustness of the pretrained agents. Nevertheless, the simulation-trained policies often need to be tuned with real-world data to reach optimal performance, which is challenging due to the high cost of real-world samples. This work proposes a distributed cloud-edge architecture to train DRL agents in the real world in real-time. In the architecture, the inference and training are assigned to the edge and cloud, separating the real-time control loop from the computationally expensive training loop. To overcome the reality gap, our architecture exploits sim-to-real transfer strategies to continue the training of simulation-pretrained agents on a physical system. We demonstrate its applicability on a physical inverted-pendulum control system, analyzing critical parameters. The real-world experiments show that our architecture can adapt the pretrained DRL agents to unseen dynamics consistently and efficiently.

LGMar 29, 2023
Physical Deep Reinforcement Learning Towards Safety Guarantee

Hongpeng Cao, Yanbing Mao, Lui Sha et al.

Deep reinforcement learning (DRL) has achieved tremendous success in many complex decision-making tasks of autonomous systems with high-dimensional state and/or action spaces. However, the safety and stability still remain major concerns that hinder the applications of DRL to safety-critical autonomous systems. To address the concerns, we proposed the Phy-DRL: a physical deep reinforcement learning framework. The Phy-DRL is novel in two architectural designs: i) Lyapunov-like reward, and ii) residual control (i.e., integration of physics-model-based control and data-driven control). The concurrent physical reward and residual control empower the Phy-DRL the (mathematically) provable safety and stability guarantees. Through experiments on the inverted pendulum, we show that the Phy-DRL features guaranteed safety and stability and enhanced robustness, while offering remarkably accelerated training and enlarged reward.

SYMay 5, 2017
Restart-Based Fault-Tolerance: System Design and Schedulability Analysis

Fardin Abdi, Renato Mancuso, Rohan Tabish et al.

Embedded systems in safety-critical environments are continuously required to deliver more performance and functionality, while expected to provide verified safety guarantees. Nonetheless, platform-wide software verification (required for safety) is often expensive. Therefore, design methods that enable utilization of components such as real-time operating systems (RTOS), without requiring their correctness to guarantee safety, is necessary. In this paper, we propose a design approach to deploy safe-by-design embedded systems. To attain this goal, we rely on a small core of verified software to handle faults in applications and RTOS and recover from them while ensuring that timing constraints of safety-critical tasks are always satisfied. Faults are detected by monitoring the application timing and fault-recovery is achieved via full platform restart and software reload, enabled by the short restart time of embedded systems. Schedulability analysis is used to ensure that the timing constraints of critical plant control tasks are always satisfied in spite of faults and consequent restarts. We derive schedulability results for four restart-tolerant task models. We use a simulator to evaluate and compare the performance of the considered scheduling models.

ROSep 4, 2022
Perception Simplex: Verifiable Collision Avoidance in Autonomous Vehicles Amidst Obstacle Detection Faults

Ayoosh Bansal, Hunmin Kim, Simon Yu et al.

Advances in deep learning have revolutionized cyber-physical applications, including the development of Autonomous Vehicles. However, real-world collisions involving autonomous control of vehicles have raised significant safety concerns regarding the use of Deep Neural Networks (DNN) in safety-critical tasks, particularly Perception. The inherent unverifiability of DNNs poses a key challenge in ensuring their safe and reliable operation. In this work, we propose Perception Simplex (PS), a fault-tolerant application architecture designed for obstacle detection and collision avoidance. We analyze an existing LiDAR-based classical obstacle detection algorithm to establish strict bounds on its capabilities and limitations. Such analysis and verification have not been possible for deep learning-based perception systems yet. By employing verifiable obstacle detection algorithms, PS identifies obstacle existence detection faults in the output of unverifiable DNN-based object detectors. When faults with potential collision risks are detected, appropriate corrective actions are initiated. Through extensive analysis and software-in-the-loop simulations, we demonstrate that PS provides predictable and deterministic fault tolerance against obstacle existence detection faults, establishing a robust safety guarantee.

LGJan 26, 2023
Learning to Generate All Feasible Actions

Mirco Theile, Daniele Bernardini, Raphael Trumpp et al.

Modern cyber-physical systems are becoming increasingly complex to model, thus motivating data-driven techniques such as reinforcement learning (RL) to find appropriate control agents. However, most systems are subject to hard constraints such as safety or operational bounds. Typically, to learn to satisfy these constraints, the agent must violate them systematically, which is computationally prohibitive in most systems. Recent efforts aim to utilize feasibility models that assess whether a proposed action is feasible to avoid applying the agent's infeasible action proposals to the system. However, these efforts focus on guaranteeing constraint satisfaction rather than the agent's learning efficiency. To improve the learning process, we introduce action mapping, a novel approach that divides the learning process into two steps: first learn feasibility and subsequently, the objective by mapping actions into the sets of feasible actions. This paper focuses on the feasibility part by learning to generate all feasible actions through self-supervised querying of the feasibility model. We train the agent by formulating the problem as a distribution matching problem and deriving gradient estimators for different divergences. Through an illustrative example, a robotic path planning scenario, and a robotic grasping simulation, we demonstrate the agent's proficiency in generating actions across disconnected feasible action sets. By addressing the feasibility step, this paper makes it possible to focus future work on the objective part of action mapping, paving the way for an RL framework that is both safe and efficient.

LGMay 11Code
Higher Resolution, Better Generalization: Unlocking Visual Scaling in Deep Reinforcement Learning

Raphael Trumpp, Ömer Veysel Çağatan, Barış Akgün et al.

Pixel-based deep reinforcement learning agents are typically trained on heavily downsampled visual observations, a convention inherited from early benchmarks rather than grounded in principled design. In this work, we show that observation resolution is a critical yet overlooked variable for policy learning: higher-resolution inputs can substantially improve both performance and generalization, provided the network architecture can process them effectively. We find that the widely used Impala encoder, which flattens spatial features into a vector, suffers from quadratic parameter growth as resolution increases and fails to leverage the additional visual detail. Replacing this operation with global average pooling, as in the Impoola architecture, decouples parameter count from resolution and yields consistent improvements across resolutions and network widths - at their respective best conditions, visual scaling unlocks a 28 % performance gain for Impoola over Impala. These gains are strongest in environments that require precise perception of small or distant objects, and gradient saliency analysis confirms that the underlying mechanism is a more spatially localized visual attention of the policy at higher resolutions. Our results challenge the prevailing practice of aggressive input downsampling and position resolution-independent architectures as a simple, effective path toward scalable visual deep RL. To facilitate future research on resolution scaling in deep RL, we publicly release the open-source code for the Procgen-HD benchmark: https://github.com/raphajaner/procgen-hd.

SYApr 19
Controlled Invariant Sets for Gaussian Process State Space Models

Paul Griffioen, Bingzhuo Zhong, Murat Arcak et al.

We compute probabilistic controlled invariant sets for nonlinear systems using Gaussian process state space models, which are data-driven models that account for unmodeled and unknown nonlinear dynamics. We propose a semidefinite programming scheme for designing state-feedback controllers that maximize the probability of the trajectories staying within a probabilistic controlled invariant set while satisfying input constraints. The results are validated on a quadrotor, both in simulation and on a physical platform.

LGMar 7, 2025Code
Impoola: The Power of Average Pooling for Image-Based Deep Reinforcement Learning

Raphael Trumpp, Ansgar Schäfftlein, Mirco Theile et al.

As image-based deep reinforcement learning tackles more challenging tasks, increasing model size has become an important factor in improving performance. Recent studies achieved this by focusing on the parameter efficiency of scaled networks, typically using Impala-CNN, a 15-layer ResNet-inspired network, as the image encoder. However, while Impala-CNN evidently outperforms older CNN architectures, potential advancements in network design for deep reinforcement learning-specific image encoders remain largely unexplored. We find that replacing the flattening of output feature maps in Impala-CNN with global average pooling leads to a notable performance improvement. This approach outperforms larger and more complex models in the Procgen Benchmark, particularly in terms of generalization. We call our proposed encoder model Impoola-CNN. A decrease in the network's translation sensitivity may be central to this improvement, as we observe the most significant gains in games without agent-centered observations. Our results demonstrate that network scaling is not just about increasing model size - efficient network design is also an essential factor. We make our code available at https://github.com/raphajaner/impoola.

PFMar 18
ETM2: Empowering Traditional Memory Bandwidth Regulation using ETM

Alexander Zuepke, Ashutosh Pradhan, Daniele Ottaviano et al.

The Embedded Trace Macrocell (ETM) is a standard component of Arm's CoreSight architecture, present in a wide range of platforms and primarily designed for tracing and debugging. In this work, we demonstrate that it can be repurposed to implement a novel hardware-assisted memory bandwidth regulator, providing a portable and effective solution to mitigate memory interference in real-time multicore systems. ETM2 requires minimal software intervention and bridges the gap between the fine-grained microsecond resolution of MemPol and the portability and reaction time of interrupt-based solutions, such as MemGuard. We assess the effectiveness and portability of our design with an evaluation on a large number of 64-bit Arm boards, and we compare ETM2 with previous works using a setup based on the San Diego Vision Benchmark Suite on the AMD Zynq UltraScale+. Our results show the scalability of the approach and highlight the design trade-offs it enables. ETM2 is effective in enforcing per-core memory bandwidth regulation and unlocks new regulation options that were infeasible under MemGuard and MemPol.

ROMar 13Code
Efficient Real-World Autonomous Racing via Attenuated Residual Policy Optimization

Raphael Trumpp, Denis Hoornaert, Mirco Theile et al.

Residual policy learning (RPL), in which a learned policy refines a static base policy using deep reinforcement learning (DRL), has shown strong performance across various robotic applications. Its effectiveness is particularly evident in autonomous racing, a domain that serves as a challenging benchmark for real-world DRL. However, deploying RPL-based controllers introduces system complexity and increases inference latency. We address this by introducing an extension of RPL named attenuated residual policy optimization ($α$-RPO). Unlike standard RPL, $α$-RPO yields a standalone neural policy by progressively attenuating the base policy, which initially serves to bootstrap learning. Furthermore, this mechanism enables a form of privileged learning, where the base policy is permitted to use sensor modalities not required for final deployment. We design $α$-RPO to integrate seamlessly with PPO, ensuring that the attenuated influence of the base controller is dynamically compensated during policy optimization. We evaluate $α$-RPO by building a framework for 1:10-scaled autonomous racing around it. In both simulation and zero-shot real-world transfer to Roboracer cars, $α$-RPO not only reduces system complexity but also improves driving performance compared to baselines - demonstrating its practicality for robotic deployment. Our code is available at: https://github.com/raphajaner/arpo_racing.

LGSep 5, 2024
Simplex-enabled Safe Continual Learning Machine

Hongpeng Cao, Yanbing Mao, Yihao Cai et al.

This paper proposes the SeC-Learning Machine: Simplex-enabled safe continual learning for safety-critical autonomous systems. The SeC-learning machine is built on Simplex logic (that is, ``using simplicity to control complexity'') and physics-regulated deep reinforcement learning (Phy-DRL). The SeC-learning machine thus constitutes HP (high performance)-Student, HA (high assurance)-Teacher, and Coordinator. Specifically, the HP-Student is a pre-trained high-performance but not fully verified Phy-DRL, continuing to learn in a real plant to tune the action policy to be safe. In contrast, the HA-Teacher is a mission-reduced, physics-model-based, and verified design. As a complementary, HA-Teacher has two missions: backing up safety and correcting unsafe learning. The Coordinator triggers the interaction and the switch between HP-Student and HA-Teacher. Powered by the three interactive components, the SeC-learning machine can i) assure lifetime safety (i.e., safety guarantee in any continual-learning stage, regardless of HP-Student's success or convergence), ii) address the Sim2Real gap, and iii) learn to tolerate unknown unknowns in real plants. The experiments on a cart-pole system and a real quadruped robot demonstrate the distinguished features of the SeC-learning machine, compared with continual learning built on state-of-the-art safe DRL frameworks with approaches to addressing the Sim2Real gap.

DCMar 15, 2024
Strict Partitioning for Sporadic Rigid Gang Tasks

Binqi Sun, Tomasz Kloda, Marco Caccamo

The rigid gang task model is based on the idea of executing multiple threads simultaneously on a fixed number of processors to increase efficiency and performance. Although there is extensive literature on global rigid gang scheduling, partitioned approaches have several practical advantages (e.g., task isolation and reduced scheduling overheads). In this paper, we propose a new partitioned scheduling strategy for rigid gang tasks, named strict partitioning. The method creates disjoint partitions of tasks and processors to avoid inter-partition interference. Moreover, it tries to assign tasks with similar volumes (i.e., parallelisms) to the same partition so that the intra-partition interference can be reduced. Within each partition, the tasks can be scheduled using any type of scheduler, which allows the use of a less pessimistic schedulability test. Extensive synthetic experiments and a case study based on Edge TPU benchmarks show that strict partitioning achieves better schedulability performance than state-of-the-art global gang schedulability analyses for both preemptive and non-preemptive rigid gang task sets.

LGDec 5, 2024
Action Mapping for Reinforcement Learning in Continuous Environments with Constraints

Mirco Theile, Lukas Dirnberger, Raphael Trumpp et al.

Deep reinforcement learning (DRL) has had success across various domains, but applying it to environments with constraints remains challenging due to poor sample efficiency and slow convergence. Recent literature explored incorporating model knowledge to mitigate these problems, particularly through the use of models that assess the feasibility of proposed actions. However, integrating feasibility models efficiently into DRL pipelines in environments with continuous action spaces is non-trivial. We propose a novel DRL training strategy utilizing action mapping that leverages feasibility models to streamline the learning process. By decoupling the learning of feasible actions from policy optimization, action mapping allows DRL agents to focus on selecting the optimal action from a reduced feasible action set. We demonstrate through experiments that action mapping significantly improves training performance in constrained environments with continuous action spaces, especially with imperfect feasibility models.

ROMay 13, 2025
Continuous World Coverage Path Planning for Fixed-Wing UAVs using Deep Reinforcement Learning

Mirco Theile, Andres R. Zapata Rodriguez, Marco Caccamo et al.

Unmanned Aerial Vehicle (UAV) Coverage Path Planning (CPP) is critical for applications such as precision agriculture and search and rescue. While traditional methods rely on discrete grid-based representations, real-world UAV operations require power-efficient continuous motion planning. We formulate the UAV CPP problem in a continuous environment, minimizing power consumption while ensuring complete coverage. Our approach models the environment with variable-size axis-aligned rectangles and UAV motion with curvature-constrained Bézier curves. We train a reinforcement learning agent using an action-mapping-based Soft Actor-Critic (AM-SAC) algorithm employing a self-adaptive curriculum. Experiments on both procedurally generated and hand-crafted scenarios demonstrate the effectiveness of our method in learning energy-efficient coverage strategies.

LGJul 25, 2025
Observations Meet Actions: Learning Control-Sufficient Representations for Robust Policy Generalization

Yuliang Gu, Hongpeng Cao, Marco Caccamo et al.

Capturing latent variations ("contexts") is key to deploying reinforcement-learning (RL) agents beyond their training regime. We recast context-based RL as a dual inference-control problem and formally characterize two properties and their hierarchy: observation sufficiency (preserving all predictive information) and control sufficiency (retaining decision-making relevant information). Exploiting this dichotomy, we derive a contextual evidence lower bound(ELBO)-style objective that cleanly separates representation learning from policy learning and optimizes it with Bottlenecked Contextual Policy Optimization (BCPO), an algorithm that places a variational information-bottleneck encoder in front of any off-policy policy learner. On standard continuous-control benchmarks with shifting physical parameters, BCPO matches or surpasses other baselines while using fewer samples and retaining performance far outside the training regime. The framework unifies theory, diagnostics, and practice for context-based RL.

CVJul 7, 2025
From Marginal to Joint Predictions: Evaluating Scene-Consistent Trajectory Prediction Approaches for Automated Driving

Fabian Konstantinidis, Ariel Dallari Guerreiro, Raphael Trumpp et al.

Accurate motion prediction of surrounding traffic participants is crucial for the safe and efficient operation of automated vehicles in dynamic environments. Marginal prediction models commonly forecast each agent's future trajectories independently, often leading to sub-optimal planning decisions for an automated vehicle. In contrast, joint prediction models explicitly account for the interactions between agents, yielding socially and physically consistent predictions on a scene level. However, existing approaches differ not only in their problem formulation but also in the model architectures and implementation details used, making it difficult to compare them. In this work, we systematically investigate different approaches to joint motion prediction, including post-processing of the marginal predictions, explicitly training the model for joint predictions, and framing the problem as a generative task. We evaluate each approach in terms of prediction accuracy, multi-modality, and inference efficiency, offering a comprehensive analysis of the strengths and limitations of each approach. Several prediction examples are available at https://frommarginaltojointpred.github.io/.

LGJun 2, 2025
Bregman Centroid Guided Cross-Entropy Method

Yuliang Gu, Hongpeng Cao, Marco Caccamo et al.

The Cross-Entropy Method (CEM) is a widely adopted trajectory optimizer in model-based reinforcement learning (MBRL), but its unimodal sampling strategy often leads to premature convergence in multimodal landscapes. In this work, we propose Bregman Centroid Guided CEM ($\mathcal{BC}$-EvoCEM), a lightweight enhancement to ensemble CEM that leverages $\textit{Bregman centroids}$ for principled information aggregation and diversity control. $\textbf{$\mathcal{BC}$-EvoCEM}$ computes a performance-weighted Bregman centroid across CEM workers and updates the least contributing ones by sampling within a trust region around the centroid. Leveraging the duality between Bregman divergences and exponential family distributions, we show that $\textbf{$\mathcal{BC}$-EvoCEM}$ integrates seamlessly into standard CEM pipelines with negligible overhead. Empirical results on synthetic benchmarks, a cluttered navigation task, and full MBRL pipelines demonstrate that $\textbf{$\mathcal{BC}$-EvoCEM}$ enhances both convergence and solution quality, providing a simple yet effective upgrade for CEM.

RODec 17, 2024
Physics-model-guided Worst-case Sampling for Safe Reinforcement Learning

Hongpeng Cao, Yanbing Mao, Lui Sha et al.

Real-world accidents in learning-enabled CPS frequently occur in challenging corner cases. During the training of deep reinforcement learning (DRL) policy, the standard setup for training conditions is either fixed at a single initial condition or uniformly sampled from the admissible state space. This setup often overlooks the challenging but safety-critical corner cases. To bridge this gap, this paper proposes a physics-model-guided worst-case sampling strategy for training safe policies that can handle safety-critical cases toward guaranteed safety. Furthermore, we integrate the proposed worst-case sampling strategy into the physics-regulated deep reinforcement learning (Phy-DRL) framework to build a more data-efficient and safe learning algorithm for safety-critical CPS. We validate the proposed training strategy with Phy-DRL through extensive experiments on a simulated cart-pole system, a 2D quadrotor, a simulated and a real quadruped robot, showing remarkably improved sampling efficiency to learn more robust safe policies.

LGMar 19, 2024
Equivariant Ensembles and Regularization for Reinforcement Learning in Map-based Path Planning

Mirco Theile, Hongpeng Cao, Marco Caccamo et al.

In reinforcement learning (RL), exploiting environmental symmetries can significantly enhance efficiency, robustness, and performance. However, ensuring that the deep RL policy and value networks are respectively equivariant and invariant to exploit these symmetries is a substantial challenge. Related works try to design networks that are equivariant and invariant by construction, limiting them to a very restricted library of components, which in turn hampers the expressiveness of the networks. This paper proposes a method to construct equivariant policies and invariant value functions without specialized neural network components, which we term equivariant ensembles. We further add a regularization term for adding inductive bias during training. In a map-based path planning case study, we show how equivariant ensembles and regularization benefit sample efficiency and performance.

AIMay 26, 2023
Physics-Regulated Deep Reinforcement Learning: Invariant Embeddings

Hongpeng Cao, Yanbing Mao, Lui Sha et al.

This paper proposes the Phy-DRL: a physics-regulated deep reinforcement learning (DRL) framework for safety-critical autonomous systems. The Phy-DRL has three distinguished invariant-embedding designs: i) residual action policy (i.e., integrating data-driven-DRL action policy and physics-model-based action policy), ii) automatically constructed safety-embedded reward, and iii) physics-model-guided neural network (NN) editing, including link editing and activation editing. Theoretically, the Phy-DRL exhibits 1) a mathematically provable safety guarantee and 2) strict compliance of critic and actor networks with physics knowledge about the action-value function and action policy. Finally, we evaluate the Phy-DRL on a cart-pole system and a quadruped robot. The experiments validate our theoretical results and demonstrate that Phy-DRL features guaranteed safety compared to purely data-driven DRL and solely model-based design while offering remarkably fewer learning parameters and fast training towards safety guarantee.

ROJul 21, 2021
Multi-Agent Belief Sharing through Autonomous Hierarchical Multi-Level Clustering

Mirco Theile, Jonathan Ponniah, Or Dantsker et al.

Coordination in multi-agent systems is challenging for agile robots such as unmanned aerial vehicles (UAVs), where relative agent positions frequently change due to unconstrained movement. The problem is exacerbated through the individual take-off and landing of agents for battery recharging leading to a varying number of active agents throughout the whole mission. This work proposes autonomous hierarchical multi-level clustering (MLC), which forms a clustering hierarchy utilizing decentralized methods. Through periodic cluster maintenance executed by cluster-heads, stable multi-level clustering is achieved. The resulting hierarchy is used as a backbone to solve the communication problem for locally-interactive applications such as UAV tracking problems. Using observation aggregation, compression, and dissemination, agents share local observations throughout the hierarchy, giving every agent a total system belief with spatially dependent resolution and freshness. Extensive simulations show that MLC yields a stable cluster hierarchy under different motion patterns and that the proposed belief sharing is highly applicable in wildfire front monitoring scenarios.

ROJun 8, 2021
Risk Ranked Recall: Collision Safety Metric for Object Detection Systems in Autonomous Vehicles

Ayoosh Bansal, Jayati Singh, Micaela Verucchi et al.

Commonly used metrics for evaluation of object detection systems (precision, recall, mAP) do not give complete information about their suitability of use in safety critical tasks, like obstacle detection for collision avoidance in Autonomous Vehicles (AV). This work introduces the Risk Ranked Recall ($R^3$) metrics for object detection systems. The $R^3$ metrics categorize objects within three ranks. Ranks are assigned based on an objective cyber-physical model for the risk of collision. Recall is measured for each rank.

CRApr 9, 2021
SchedGuard: Protecting against Schedule Leaks Using Linux Containers

Jiyang Chen, Tomasz Kloda, Ayoosh Bansal et al.

Real-time systems have recently been shown to be vulnerable to timing inference attacks, mainly due to their predictable behavioral patterns. Existing solutions such as schedule randomization lack the ability to protect against such attacks, often limited by the system's real-time nature. This paper presents SchedGuard: a temporal protection framework for Linux-based hard real-time systems that protects against posterior scheduler side-channel attacks by preventing untrusted tasks from executing during specific time segments. SchedGuard is integrated into the Linux kernel using cgroups, making it amenable to use with container frameworks. We demonstrate the effectiveness of our system using a realistic radio-controlled rover platform and synthetically generated workloads. Not only is SchedGuard able to protect against the attacks mentioned above, but it also ensures that the real-time tasks/containers meet their temporal requirements.

MAOct 23, 2020
Multi-UAV Path Planning for Wireless Data Harvesting with Deep Reinforcement Learning

Harald Bayerlein, Mirco Theile, Marco Caccamo et al.

Harvesting data from distributed Internet of Things (IoT) devices with multiple autonomous unmanned aerial vehicles (UAVs) is a challenging problem requiring flexible path planning methods. We propose a multi-agent reinforcement learning (MARL) approach that, in contrast to previous work, can adapt to profound changes in the scenario parameters defining the data harvesting mission, such as the number of deployed UAVs, number, position and data amount of IoT devices, or the maximum flying time, without the need to perform expensive recomputations or relearn control policies. We formulate the path planning problem for a cooperative, non-communicating, and homogeneous team of UAVs tasked with maximizing collected data from distributed IoT sensor nodes subject to flying time and collision avoidance constraints. The path planning problem is translated into a decentralized partially observable Markov decision process (Dec-POMDP), which we solve through a deep reinforcement learning (DRL) approach, approximating the optimal UAV control policy without prior knowledge of the challenging wireless channel characteristics in dense urban environments. By exploiting a combination of centered global and local map representations of the environment that are fed into convolutional layers of the agents, we show that our proposed network architecture enables the agents to cooperate effectively by carefully dividing the data collection task among themselves, adapt to large complex environments and state spaces, and make movement decisions that balance data collection goals, flight-time efficiency, and navigation constraints. Finally, learning a control policy that generalizes over the scenario parameter space enables us to analyze the influence of individual parameters on collection performance and provide some intuition about system-level benefits.

ROOct 14, 2020
UAV Path Planning using Global and Local Map Information with Deep Reinforcement Learning

Mirco Theile, Harald Bayerlein, Richard Nai et al.

Path planning methods for autonomous unmanned aerial vehicles (UAVs) are typically designed for one specific type of mission. This work presents a method for autonomous UAV path planning based on deep reinforcement learning (DRL) that can be applied to a wide range of mission scenarios. Specifically, we compare coverage path planning (CPP), where the UAV's goal is to survey an area of interest to data harvesting (DH), where the UAV collects data from distributed Internet of Things (IoT) sensor devices. By exploiting structured map information of the environment, we train double deep Q-networks (DDQNs) with identical architectures on both distinctly different mission scenarios to make movement decisions that balance the respective mission goal with navigation constraints. By introducing a novel approach exploiting a compressed global map of the environment combined with a cropped but uncompressed local map showing the vicinity of the UAV agent, we demonstrate that the proposed method can efficiently scale to large environments. We also extend previous results for generalizing control policies that require no retraining when scenario parameters change and offer a detailed analysis of crucial map processing parameters' effects on path planning performance.

LGJul 1, 2020
UAV Path Planning for Wireless Data Harvesting: A Deep Reinforcement Learning Approach

Harald Bayerlein, Mirco Theile, Marco Caccamo et al.

Autonomous deployment of unmanned aerial vehicles (UAVs) supporting next-generation communication networks requires efficient trajectory planning methods. We propose a new end-to-end reinforcement learning (RL) approach to UAV-enabled data collection from Internet of Things (IoT) devices in an urban environment. An autonomous drone is tasked with gathering data from distributed sensor nodes subject to limited flying time and obstacle avoidance. While previous approaches, learning and non-learning based, must perform expensive recomputations or relearn a behavior when important scenario parameters such as the number of sensors, sensor positions, or maximum flying time, change, we train a double deep Q-network (DDQN) with combined experience replay to learn a UAV control policy that generalizes over changing scenario parameters. By exploiting a multi-layer map of the environment fed through convolutional network layers to the agent, we show that our proposed network architecture enables the agent to make movement decisions for a variety of scenario parameters that balance the data collection goal with flight time efficiency and safety constraints. Considerable advantages in learning efficiency from using a map centered on the UAV's position over a non-centered map are also illustrated.

ROMar 5, 2020
UAV Coverage Path Planning under Varying Power Constraints using Deep Reinforcement Learning

Mirco Theile, Harald Bayerlein, Richard Nai et al.

Coverage path planning (CPP) is the task of designing a trajectory that enables a mobile agent to travel over every point of an area of interest. We propose a new method to control an unmanned aerial vehicle (UAV) carrying a camera on a CPP mission with random start positions and multiple options for landing positions in an environment containing no-fly zones. While numerous approaches have been proposed to solve similar CPP problems, we leverage end-to-end reinforcement learning (RL) to learn a control policy that generalizes over varying power constraints for the UAV. Despite recent improvements in battery technology, the maximum flying range of small UAVs is still a severe constraint, which is exacerbated by variations in the UAV's power consumption that are hard to predict. By using map-like input channels to feed spatial information through convolutional network layers to the agent, we are able to train a double deep Q-network (DDQN) to make control decisions for the UAV, balancing limited power budget and coverage goal. The proposed method can be applied to a wide variety of environments and harmonizes complex goal structures with system constraints.

CRMay 3, 2017
Restart-Based Security Mechanisms for Safety-Critical Embedded Systems

Fardin Abdi, Chien-Ying Chen, Monowar Hasan et al.

Many physical plants that are controlled by embedded systems have safety requirements that need to be respected at all times - any deviations from expected behavior can result in damage to the system (often to the physical plant), the environment or even endanger human life. In recent times, malicious attacks against such systems have increased - many with the intent to cause physical damage. In this paper, we aim to decouple the safety of the plant from security of the embedded system by taking advantage of the inherent inertia in such systems. In this paper we present a system-wide restart-based framework that combines hardware and software components to (a) maintain the system within the safety region and (b) thwart potential attackers from destabilizing the system. We demonstrate the feasibility of our approach using two realistic systems - an actual 3 degree of freedom (3-DoF) helicopter and a simulated warehouse temperature control unit. Our proof-of-concept implementation is tested against multiple emulated attacks on the control units of these systems.

CRFeb 26, 2012
S3A: Secure System Simplex Architecture for Enhanced Security of Cyber-Physical Systems

Sibin Mohan, Stanley Bak, Emiliano Betti et al.

Until recently, cyber-physical systems, especially those with safety-critical properties that manage critical infrastructure (e.g. power generation plants, water treatment facilities, etc.) were considered to be invulnerable against software security breaches. The recently discovered 'W32.Stuxnet' worm has drastically changed this perception by demonstrating that such systems are susceptible to external attacks. Here we present an architecture that enhances the security of safety-critical cyber-physical systems despite the presence of such malware. Our architecture uses the property that control systems have deterministic execution behavior, to detect an intrusion within 0.6 μs while still guaranteeing the safety of the plant. We also show that even if an attack is successful, the overall state of the physical system will still remain safe. Even if the operating system's administrative privileges have been compromised, our architecture will still be able to protect the physical system from coming to harm.