50.8ROMay 26Code
PEACE: A Planner-Executor Agent with Constraint Enforcement for UAVsErdem Uysal, Timo Kehrer, Sebastiano Panichella
Foundation models are increasingly used to drive autonomous systems, yet existing approaches either keep the model in a tight control loop, raising latency and hallucination risk, or compile natural language into opaque end-to-end policies that are hard to explain, constraint and require domain-specific datasets and fine-tuning. We propose a planner-executor agent for PX4-based drones that decouples high-level mission planning from low-level control. A large language model performs single-pass task planning, while execution is handled through a structured ROS 2 tool-calling interface bridged to MAVLink. The system constructs a world model by combining modular 2D detectors (e.g., YOLO or vision-language models) with a pinhole depth projection module for 3D object localization. A constraint enforcement layer enforces altitude limits and horizontal geofencing, and bounded replanning enables recovery from execution-time action failures. We position our approach within three common design patterns for foundation-model-based robotics systems and demonstrate its feasibility in PX4 software-in-the-loop simulations in Gazebo. Results highlight improved explainability, constraint enforcement, and reduced LLM calls compared to tightly coupled LLM control. The code, dataset, videos, and other material can be found at the following link: https://github.com/erdemuysalx/PEACE
10.2LGMay 18
Multi-Agent Reinforcement Learning for Safe Autonomous Driving Under Pedestrian Behavioral UncertaintyPrakash Aryan, Kaushik Raghupathruni, Timo Kehrer et al.
Simulation-based testing of self-driving cars (SDCs) typically relies on scripted or simplified pedestrian models that do not capture the heterogeneity and uncertainty of real human crossing behavior. This limits the realism of safety assessments, especially in scenarios involving jaywalking, which is governed by latent personality traits that the vehicle cannot observe. We hypothesize that jointly training pedestrians and the SDC with multi-agent reinforcement learning (MARL) produces more realistic interaction scenarios than training the SDC against fixed pedestrian policies, and that the resulting behavior gap between predictable and unpredictable crossings can be measured directly from trajectories. This paper describes a MARL environment in which an SDC and 12 pedestrians are co-trained using Multi-Agent Proximal Policy Optimization (MAPPO). Pedestrian locomotion follows scripted Dijkstra pathfinding, while an RL policy controls high-level go/wait decisions. Jaywalking probability depends on a per-pedestrian personality trait sampled at episode start and hidden from the SDC. In 500-episode evaluations, the co-trained SDC reached 78% of goals with a 14% collision rate, compared to 35% goals and 33% collisions for the best rule-based baseline. A speed differential metric shows that the SDC traveled 2.65 m/s faster near jaywalkers than near crosswalk users at close range (0-3 m), indicating that jaywalking encounters were not anticipated. Jaywalking accounted for 13% of crossing events but was associated with 62% of collisions. Co-training with MARL pedestrians reduced collisions by 30% relative to single-agent RL, as pedestrians learned to wait when the SDC approached at speed.
3.6ROMay 14
MR-SLAM: Immersive Spatial Supervision for Multi-Robot Mapping via Mixed RealityPrakash Aryan, Cem Erdogdu, Kavinaya Kumarchokkappan et al.
Operating a multi-robot fleet for simultaneous localization and mapping (SLAM) in applications such as building inspection or warehouse-aisle monitoring requires the operator to maintain spatial awareness of each robot's position and mapping state, a task that scales poorly on conventional 2D interfaces. We present MR-SLAM, a mixed reality (MR) system in which an operator wearing a Meta Quest 3 headset teleoperates three simulated TurtleBot3 robots through a passthrough view with real-world occlusion, while spatially anchored dashboard panels report mapping progress in situ. Each robot runs an independent SLAM Toolbox instance whose occupancy grid is merged in real time on a Robot Operating System 2 (ROS 2) back end. Across five 9-minute evaluation sessions, the system delivered scans at 8.83 +/- 0.16 Hz, mapped 17.9 +/- 0.8 m^2 of merged occupancy, and reached 94.7 +/- 0.5% cross-instance occupancy consistency across robot pairs. An additional session recorded 6.3 ms median transform jitter and 26.7 m^2 coverage of a 41 m^2 grid. We position MR-SLAM as a reference implementation for combining passthrough mixed reality supervision with multi-robot SLAM on consumer hardware.
SEMar 26, 2025
Can We Make Code Green? Understanding Trade-Offs in LLMs vs. Human Code OptimizationsPooja Rani, Jan-Andrea Bard, June Sallou et al.
The rapid technological evolution has accelerated software development for various domains and use cases, contributing to a growing share of global carbon emissions. While recent large language models (LLMs) claim to assist developers in optimizing code for performance and energy efficiency, their efficacy in real-world scenarios remains under exploration. In this work, we explore the effectiveness of LLMs in reducing the environmental footprint of real-world projects, focusing on software written in Matlab-widely used in both academia and industry for scientific and engineering applications. We analyze energy-focused optimization on 400 scripts across 100 top GitHub repositories. We examine potential 2,176 optimizations recommended by leading LLMs, such as GPT-3, GPT-4, Llama, and Mixtral, and a senior Matlab developer, on energy consumption, memory usage, execution time consumption, and code correctness. The developer serves as a real-world baseline for comparing typical human and LLM-generated optimizations. Mapping these optimizations to 13 high-level themes, we found that LLMs propose a broad spectrum of improvements--beyond energy efficiency--including improving code readability and maintainability, memory management, error handling while the developer overlooked some parallel processing, error handling etc. However, our statistical tests reveal that the energy-focused optimizations unexpectedly negatively impacted memory usage, with no clear benefits regarding execution time or energy consumption. Our qualitative analysis of energy-time trade-offs revealed that some themes, such as vectorization preallocation, were among the common themes shaping these trade-offs. With LLMs becoming ubiquitous in modern software development, our study serves as a call to action: prioritizing the evaluation of common coding practices to identify the green ones.
CRJan 20, 2022
VUDENC: Vulnerability Detection with Deep Learning on a Natural Codebase for PythonLaura Wartschinski, Yannic Noller, Thomas Vogel et al.
Context: Identifying potential vulnerable code is important to improve the security of our software systems. However, the manual detection of software vulnerabilities requires expert knowledge and is time-consuming, and must be supported by automated techniques. Objective: Such automated vulnerability detection techniques should achieve a high accuracy, point developers directly to the vulnerable code fragments, scale to real-world software, generalize across the boundaries of a specific software project, and require no or only moderate setup or configuration effort. Method: In this article, we present VUDENC (Vulnerability Detection with Deep Learning on a Natural Codebase), a deep learning-based vulnerability detection tool that automatically learns features of vulnerable code from a large and real-world Python codebase. VUDENC applies a word2vec model to identify semantically similar code tokens and to provide a vector representation. A network of long-short-term memory cells (LSTM) is then used to classify vulnerable code token sequences at a fine-grained level, highlight the specific areas in the source code that are likely to contain vulnerabilities, and provide confidence levels for its predictions. Results: To evaluate VUDENC, we used 1,009 vulnerability-fixing commits from different GitHub repositories that contain seven different types of vulnerabilities (SQL injection, XSS, Command injection, XSRF, Remote code execution, Path disclosure, Open redirect) for training. In the experimental evaluation, VUDENC achieves a recall of 78%-87%, a precision of 82%-96%, and an F1 score of 80%-90%. VUDENC's code, the datasets for the vulnerabilities, and the Python corpus for the word2vec model are available for reproduction. Conclusions: Our experimental results suggest...
SEAug 2, 2021
Learning Domain-Specific Edit Operations from Model Repositories with Frequent Subgraph MiningChristof Tinnes, Timo Kehrer, Mitchell Joblin et al.
Model transformations play a fundamental role in model-driven software development. They can be used to solve or support central tasks, such as creating models, handling model co-evolution, and model merging. In the past, various (semi-)automatic approaches have been proposed to derive model transformations from meta-models or from examples. These approaches require time-consuming handcrafting or recording of concrete examples, or they are unable to derive complex transformations. We propose a novel unsupervised approach, called Ockham, which is able to learn edit operations from model histories in model repositories. Ockham is based on the idea that meaningful edit operations will be the ones that compress the model differences. We evaluate our approach in two controlled experiments and one real-world case study of a large-scale industrial model-driven architecture project in the railway domain. We find that our approach is able to discover frequent edit operations that have actually been applied. Furthermore, Ockham is able to extract edit operations in an industrial setting that are meaningful to practitioners.
SEFeb 5, 2018
Shadow Symbolic Execution with Java PathFinderYannic Noller, Hoang Lam Nguyen, Minxing Tang et al.
Regression testing ensures that a software system when it evolves still performs correctly and that the changes introduce no unintended side-effects. However, the creation of regression test cases that show divergent behavior needs a lot of effort. A solution is the idea of shadow symbolic execution, originally implemented based on KLEE for programs written in C, which takes a unifed version of the old and the new program and performs symbolic execution guided by concrete values to explore the changed behavior. In this work, we apply the idea of shadow symbolic execution to Java programs and, hence, provide an extension of the Java PathFinder (JPF) project to perform shadow symbolic execution on Java bytecode. The extension has been applied on several subjects from the JPF test classes where it successfully generated test inputs that expose divergences relevant for regression testing.
LODec 22, 2017
Proceedings Third Workshop on Graphs as ModelsTimo Kehrer, Alice Miller
Graphs are used as models in many areas of computer science and computer engineering. For example graphs are used to represent syntax, control and data flow, dependency, state spaces, models such as UML and other types of domain-specific models, and social network graphs. In all of these examples, the graph serves as an intuitive yet mathematically precise foundation for many purposes, both in theory building as well as in practical applications. Graph-based models serve as an abstract communication medium and are used to describe various concepts and phenomena. Moreover, once such graph-based models are constructed, they can be analyzed and transformed to verify the correctness of static and dynamic properties, to discover new properties, to deeply study a particular domain of interest or to produce new equivalent and/or optimized versions of graph-based models. The Graphs as Models (GaM) workshop series combines the strengths of two pre-existing workshop series: GT-VMT (Graph Transformation and Visual Modelling Techniques) and GRAPHITE (Graph Inspection and Traversal Engineering), but also solicits research from other related areas, such as social network analysis. GaM offers a platform for exchanging new ideas and results for active researchers in these areas, with a particular aim of boosting inter- and transdisciplinary research exploiting new applications of graphs as models in any area of computational science. This year (2017), the third edition of the GaM workshop was co-located with the European Joint Conferences on Theory and Practice of Software 2017 (ETAPS'17), held in Uppsala, Sweden.
SEDec 6, 2016
An EMOF-Compliant Abstract Syntax for BigraphsTimo Kehrer, Christos Tsigkanos, Carlo Ghezzi
Bigraphs are an emerging modeling formalism for structures in ubiquitous computing. Besides an algebraic notation, which can be adopted to provide an algebraic syntax for bigraphs, the bigraphical theory introduces a visual concrete syntax which is intuitive and unambiguous at the same time; the standard visual notation can be customized and thus tailored to domain-specific requirements. However, in contrast to modeling standards based on the Meta-Object Facility (MOF) and domain-specific languages typically used in model-driven engineering (MDE), the bigraphical theory lacks a precise definition of an abstract syntax for bigraphical modeling languages. As a consequence, available modeling and analysis tools use proprietary formats for representing bigraphs internally and persistently, which hampers the exchange of models across tool boundaries. Moreover, tools can be hardly integrated with standard MDE technologies in order to build sophisticated tool chains and modeling environments, as required for systematic engineering of large systems or fostering experimental work to evaluate the bigraphical theory in real-world applications. To overcome this situation, we propose an abstract syntax for bigraphs which is compliant to the Essential MOF (EMOF) standard defined by the Object Management Group (OMG). We use typed graphs as a formal underpinning of EMOF-based models and present a canonical mapping which maps bigraphs to typed graphs in a natural way. We also discuss application-specific variation points in the graph-based representation of bigraphs. Following standard techniques from software product line engineering, we present a framework to customize the graph-based representation to support a variety of application scenarios.