Andreas A. Malikopoulos

SY
h-index11
31papers
183citations
Novelty47%
AI Score53

31 Papers

SYApr 14Code
Closed-Form Characterization of Constrained Double-Integrator Optimal Control

Filippos N. Tzortzoglou, Logan E. Beaver, Andreas A. Malikopoulos

We present a framework for predicting human driving behavior in mixed traffic where connected and automated vehicles (CAVs) coexist with human-driven vehicles (HDVs), and validate it using an open-source virtual reality (VR) testbed. We estimate the time-shift parameter of Newell's car-following model for individual drivers using Bayesian linear regression and derive analytical expressions for the mean and variance of predicted trajectories. These predictions are integrated into an optimal control framework for CAV trajectory planning. To address the scarcity of mixed-traffic data, we develop a VR platform supporting realistic, multi-user driving scenarios and provide a reproducible experimental framework with a dedicated tutorial website requiring only MATLAB and Unreal Engine. Results show our approach enables efficient HDV predictions, while the VR platform offers an accessible environment for studying human behavior in mixed traffic.

SYJun 22, 2020
Zero-Shot Autonomous Vehicle Policy Transfer: From Simulation to Real-World via Adversarial Learning

Behdad Chalaki, Logan E. Beaver, Ben Remer et al.

In this article, we demonstrate a zero-shot transfer of an autonomous driving policy from simulation to University of Delaware's scaled smart city with adversarial multi-agent reinforcement learning, in which an adversary attempts to decrease the net reward by perturbing both the inputs and outputs of the autonomous vehicles during training. We train the autonomous vehicles to coordinate with each other while crossing a roundabout in the presence of an adversary in simulation. The adversarial policy successfully reproduces the simulated behavior and incidentally outperforms, in terms of travel time, both a human-driving baseline and adversary-free trained policies. Finally, we demonstrate that the addition of adversarial training considerably improves the performance \eat{stability and robustness} of the policies after transfer to the real world compared to Gaussian noise injection.

LGApr 1, 2023
Connected and Automated Vehicles in Mixed-Traffic: Learning Human Driver Behavior for Effective On-Ramp Merging

Nishanth Venkatesh, Viet-Anh Le, Aditya Dave et al.

Highway merging scenarios featuring mixed traffic conditions pose significant modeling and control challenges for connected and automated vehicles (CAVs) interacting with incoming on-ramp human-driven vehicles (HDVs). In this paper, we present an approach to learn an approximate information state model of CAV-HDV interactions for a CAV to maneuver safely during highway merging. In our approach, the CAV learns the behavior of an incoming HDV using approximate information states before generating a control strategy to facilitate merging. First, we validate the efficacy of this framework on real-world data by using it to predict the behavior of an HDV in mixed traffic situations extracted from the Next-Generation Simulation repository. Then, we generate simulation data for HDV-CAV interactions in a highway merging scenario using a standard inverse reinforcement learning approach. Without assuming a prior knowledge of the generating model, we show that our approximate information state model learns to predict the future trajectory of the HDV using only observations. Subsequently, we generate safe control policies for a CAV while merging with HDVs, demonstrating a spectrum of driving behaviors, from aggressive to conservative. We demonstrate the effectiveness of the proposed approach by performing numerical simulations.

OCMay 24, 2019
Optimal Vehicle Dynamics and Powertrain Control for Connected and Automated Vehicles

Liuhui Zhao, A M Ishtiaque Mahbub, Andreas A. Malikopoulos

The implementation of connected and automated vehicle technologies enables opportunities for a novel computational framework for real-time control actions aimed at optimizing energy consumption and associated benefits. In this paper, we present a two-level control architecture for a connected and automated plug-in hybrid electric vehicle to optimize simultaneously its speed profile and powertrain efficiency. We evaluate the proposed architecture through simulation in a network of vehicles.

SYMay 24, 2019
On the Traffic Impacts of Optimally Controlled Connected and Automated Vehicles

Liuhui Zhao, Andreas A. Malikopoulos, Jackeline Rios-Torres

The implementation of connected and automated vehicle (CAV) technologies enables a novel computational framework for real-time control actions aimed at optimizing energy consumption and associated benefits. Several research efforts reported in the literature to date have proposed decentralized control algorithms to coordinate CAVs in various traffic scenarios, e.g., highway on-ramps, intersections, and roundabouts. However, the impact of optimally coordinating CAVs on the performance of a transportation network has not been thoroughly analyzed yet. In this paper, we apply a decentralized optimal control framework in a transportation network and compare its performance to a baseline scenario consisting of human-driven vehicles. We show that introducing of CAVs yields radically improved roadway capacity and network performance.

SYMay 13
Accelerating Time-Optimal Trajectory Planning for Connected and Automated Vehicles with Graph Neural Networks

Viet-Anh Le, Andreas A. Malikopoulos

In this paper, we present a learning-based framework that accelerates time- and energy-optimal trajectory planning for connected and automated vehicles (CAVs) using graph neural networks (GNNs). We formulate the multi-agent coordination problem encountered in traffic scenarios as a cooperative trajectory planning problem that minimizes travel time, subject to motion primitives derived from energy-optimal solutions. The performance of this framework can be further improved through replanning at each time step, enabling the system to incorporate newly observed information. To achieve real-time execution, we employ a graph isomorphism network with edge features (GINEConv) to learn the solutions of the time-optimal trajectory planning problem from offline-generated data. The trained model produces online predictions that serve as warm-starts for numerical optimization, thereby enabling rapid computation of minimal exit times and the associated feasible trajectories. This learning-to-warm-start approach substantially reduces computation time while preserving the control performance of the time- and energy-optimal trajectory planning framework.

SYApr 15
Integrated Routing and Intersection Control for Mixed Traffic

Filippos N. Tzortzoglou, Pengbo Zhu, Andreas A. Malikopoulos

The rapid development of cyber-physical systems is driving a transition toward mixed traffic environments comprising both human-driven and connected and automated vehicles (CAVs). This shift presents a unique opportunity to leverage the efficient operation of CAVs to improve overall network throughput. This paper introduces a hierarchical framework designed to bridge macroscopic routing optimization at the network level with microscopic vicinity control at signalized intersections. The upper layer utilizes aggregated traffic information to provide proactive routing guidance for CAVs, aiming to minimize total travel time. The lower layer leverages local vehicle states to jointly optimize traffic light phases and individual CAV trajectories, aiming to reduce intersection crossing delays and optimize energy consumption, respectively. The effectiveness of the proposed framework is validated through SUMO on the Sioux Falls benchmark network. Results demonstrate that the integration of these macroscopic and microscopic layers yields significantly better performance compared to applying either layer in isolation, significantly improving network throughput and reducing congestion.

SYApr 2
Cooperative Detour Planning for Dual-Task Drone Fleets

Pengbo Zhu, Meng Xu, Andreas A. Malikopoulos et al.

As Urban air mobility scales, commercial drone fleets offer a compelling, yet underexplored opportunity to function as mobile sensor networks for real-time urban traffic monitoring. In this paper, we propose a decentralized framework that enables drone fleets to simultaneously execute delivery tasks and observe network traffic conditions. We model the urban environment with dynamic information values associated with road segments, which accumulate traffic condition uncertainty over time and are reset upon drone visitation. This problem is formulated as a mixed-integer linear programming problem where drones maximize the traffic information reward while respecting the maximum detour for each delivery and the battery budget of each drone. Unlike centralized approaches that are computationally heavy for large fleets, our method focuses on dynamic local clustering. When drones enter communication range, they exchange their belief in traffic status and transition from isolated path planning to a local joint optimization mode, resolving coupled constraints to obtain replanned paths for each drone, respectively. Simulation results built on the real city network of Barcelona, Spain, demonstrate that, compared to a shortest-path policy that ignores the traffic monitoring task, our proposed method better utilizes the battery and detour budget to explore the city area and obtain adequate traffic information; and, thanks to its decentralized manner, this ``meet-and-merge" strategy achieves near-global optimality in network coverage with significantly reduced computation overhead compared to the centralized baseline.

SYApr 3
Rollout-Based Charging Scheduling for Electric Truck Fleets in Large Transportation Networks

Ting Bai, Xinfeng Ru, Shaoyuan Li et al.

In this paper, we investigate the charging scheduling optimization problem for large electric truck fleets operating with dedicated charging infrastructure. A central coordinator jointly determines the charging sequence and power allocation of each truck to minimize the total operational cost of the fleet. The problem is inherently combinatorial and nonlinear due to the coupling between discrete sequencing decisions and continuous charging control, rendering exact optimization intractable for real-time implementation. To address this challenge, we propose a rollout-based dynamic programming framework built upon an inner-outer two-layer structure, which decouples ordering decisions from the schedule optimization, thus enabling efficient policy evaluation and approximation. The proposed method achieves near-optimal solutions with polynomial-time complexity and adapts to dynamic arrivals and time-varying electricity prices. Simulation studies show that the rollout-based approach significantly outperforms conventional heuristics with high computational efficiency, demonstrating its effectiveness and practical applicability for real-time charging management in large-scale transportation networks.

SYApr 3
An Online Learning Approach for Two-Player Zero-Sum Linear Quadratic Games

Shanting Wang, Weihao Sun, Andreas A. Malikopoulos

In this paper, we present an online learning approach for two-player zero-sum linear quadratic games with unknown dynamics. We develop a framework combining regularized least squares model estimation, high probability confidence sets, and surrogate model selection to maintain a regular model for policy updates. We apply a shrinkage step at each episode to identify a surrogate model in the region where the generalized algebraic Riccati equation admits a stabilizing saddle point solution. We then establish regret analysis on algorithm convergence, followed by a numerical example to illustrate the convergence performance and verify the regret analysis.

LGApr 2
Communication-Efficient Distributed Learning with Differential Privacy

Xiaoxing Ren, Yuwen Ma, Nicola Bastianello et al.

We address nonconvex learning problems over undirected networks. In particular, we focus on the challenge of designing an algorithm that is both communication-efficient and that guarantees the privacy of the agents' data. The first goal is achieved through a local training approach, which reduces communication frequency. The second goal is achieved by perturbing gradients during local training, specifically through gradient clipping and additive noise. We prove that the resulting algorithm converges to a stationary point of the problem within a bounded distance. Additionally, we provide theoretical privacy guarantees within a differential privacy framework that ensure agents' training data cannot be inferred from the trained model shared over the network. We show the algorithm's superior performance on a classification task under the same privacy budget, compared with state-of-the-art methods.

SYMar 29
Safety-Constrained Optimal Control for Unknown System Dynamics

Panagiotis Kounatidis, Andreas A. Malikopoulos

In this paper, we present a framework for solving continuous optimal control problems when the true system dynamics are approximated through an imperfect model. We derive a control strategy by applying Pontryagin's Minimum Principle to the model-based Hamiltonian functional, which includes an additional penalty term that captures the deviation between the model and the true system. We then derive conditions under which this model-based strategy coincides with the optimal control strategy for the true system under mild convexity assumptions. We demonstrate the framework on a real robotic testbed for the cruise control application with safety distance constraints.

SYJan 12, 2023
Approximate Information States for Worst-Case Control and Learning in Uncertain Systems

Aditya Dave, Nishanth Venkatesh, Andreas A. Malikopoulos

In this paper, we investigate discrete-time decision-making problems in uncertain systems with partially observed states. We consider a non-stochastic model, where uncontrolled disturbances acting on the system take values in bounded sets with unknown distributions. We present a general framework for decision-making in such problems by using the notion of the information state and approximate information state, and introduce conditions to identify an uncertain variable that can be used to compute an optimal strategy through a dynamic program (DP). Next, we relax these conditions and define approximate information states that can be learned from output data without knowledge of system dynamics. We use approximate information states to formulate a DP that yields a strategy with a bounded performance loss. Finally, we illustrate the application of our results in control and reinforcement learning using numerical examples.

LGSep 12, 2023
A Q-learning Approach for Adherence-Aware Recommendations

Ioannis Faros, Aditya Dave, Andreas A. Malikopoulos

In many real-world scenarios involving high-stakes and safety implications, a human decision-maker (HDM) may receive recommendations from an artificial intelligence while holding the ultimate responsibility of making decisions. In this letter, we develop an "adherence-aware Q-learning" algorithm to address this problem. The algorithm learns the "adherence level" that captures the frequency with which an HDM follows the recommended actions and derives the best recommendation policy in real time. We prove the convergence of the proposed Q-learning algorithm to the optimal value and evaluate its performance across various scenarios.

OCMar 28, 2023
Worst-Case Control and Learning Using Partial Observations Over an Infinite Time-Horizon

Aditya Dave, Ioannis Faros, Nishanth Venkatesh et al.

Safety-critical cyber-physical systems require control strategies whose worst-case performance is robust against adversarial disturbances and modeling uncertainties. In this paper, we present a framework for approximate control and learning in partially observed systems to minimize the worst-case discounted cost over an infinite time horizon. We model disturbances to the system as finite-valued uncertain variables with unknown probability distributions. For problems with known system dynamics, we construct a dynamic programming (DP) decomposition to compute the optimal control strategy. Our first contribution is to define information states that improve the computational tractability of this DP without loss of optimality. Then, we describe a simplification for a class of problems where the incurred cost is observable at each time instance. Our second contribution is defining an approximate information state that can be constructed or learned directly from observed data for problems with observable costs. We derive bounds on the performance loss of the resulting approximate control strategy and illustrate the effectiveness of our approach in partially observed decision-making problems with a numerical example.

SYApr 16
Spatiotemporal Forecasting of Incidents and Congestion with Implications for Sustainable Traffic Control

Tony Kinchen, Ting Bai, Nishanth Venkatesh S. et al.

Urban traffic anomalies, such as collisions and disruptions, threaten the safety, efficiency, and sustainability of transportation systems. In this paper, we present a simulation-based framework for modeling, detecting, and predicting such anomalies in urban networks. Using the Simulation of Urban MObility (SUMO) platform, we generate reproducible rear-end and intersection crash scenarios with matched baselines, enabling controlled experimentation and comparative evaluation. We record vehicle-level travel time, speed, and emissions for both edge- and network-level analysis. Building on this dataset, we develop a hybrid forecasting architecture that combines bidirectional long short-term memory networks with a diffusion convolutional recurrent neural network to capture temporal dynamics and spatial dependencies. Our simulation studies on the Broadway corridor in New York City demonstrate the framework's ability to reproduce consistent incident conditions, quantify their effects, and provide accurate multi-horizon traffic forecasts. Our results highlight the value of combining controlled anomaly generation with deep predictive models to support reproducible evaluation and sustainable traffic management.

LGApr 6
Cross-fitted Proximal Learning for Model-Based Reinforcement Learning

Nishanth Venkatesh, Andreas A. Malikopoulos

Model-based reinforcement learning is attractive for sequential decision-making because it explicitly estimates reward and transition models and then supports planning through simulated rollouts. In offline settings with hidden confounding, however, models learned directly from observational data may be biased. This challenge is especially pronounced in partially observable systems, where latent factors may jointly affect actions, rewards, and future observations. Recent work has shown that policy evaluation in such confounded partially observable Markov decision processes (POMDPs) can be reduced to estimating reward-emission and observation-transition bridge functions satisfying conditional moment restrictions (CMRs). In this paper, we study the statistical estimation of these bridge functions. We formulate bridge learning as a CMR problem with nuisance objects given by a conditional mean embedding and a conditional density. We then develop a $K$-fold cross-fitted extension of the existing two-stage bridge estimator. The proposed procedure preserves the original bridge-based identification strategy while using the available data more efficiently than a single sample split. We also derive an oracle-comparator bound for the cross-fitted estimator and decompose the resulting error into a Stage I term induced by nuisance estimation and a Stage II term induced by empirical averaging.

LGDec 8, 2025
Model-Based Reinforcement Learning Under Confounding

Nishanth Venkatesh, Andreas A. Malikopoulos

We investigate model-based reinforcement learning in contextual Markov decision processes (C-MDPs) in which the context is unobserved and induces confounding in the offline dataset. In such settings, conventional model-learning methods are fundamentally inconsistent, as the transition and reward mechanisms generated under a behavioral policy do not correspond to the interventional quantities required for evaluating a state-based policy. To address this issue, we adapt a proximal off-policy evaluation approach that identifies the confounded reward expectation using only observable state-action-reward trajectories under mild invertibility conditions on proxy variables. When combined with a behavior-averaged transition model, this construction yields a surrogate MDP whose Bellman operator is well defined and consistent for state-based policies, and which integrates seamlessly with the maximum causal entropy (MaxCausalEnt) model-learning framework. The proposed formulation enables principled model learning and planning in confounded environments where contextual information is unobserved, unavailable, or impractical to collect.

SYMar 8, 2024
A Framework for Effective AI Recommendations in Cyber-Physical-Human Systems

Aditya Dave, Heeseung Bang, Andreas A. Malikopoulos

Many cyber-physical-human systems (CPHS) involve a human decision-maker who may receive recommendations from an artificial intelligence (AI) platform while holding the ultimate responsibility of making decisions. In such CPHS applications, the human decision-maker may depart from an optimal recommended decision and instead implement a different one for various reasons. In this letter, we develop a rigorous framework to overcome this challenge. In our framework, we consider that humans may deviate from AI recommendations as they perceive and interpret the system's state in a different way than the AI platform. We establish the structural properties of optimal recommendation strategies and develop an approximate human model (AHM) used by the AI. We provide theoretical bounds on the optimality gap that arises from an AHM and illustrate the efficacy of our results in a numerical example.

LGApr 28, 2025
AI Recommendation Systems for Lane-Changing Using Adherence-Aware Reinforcement Learning

Weihao Sun, Heeseung Bang, Andreas A. Malikopoulos

In this paper, we present an adherence-aware reinforcement learning (RL) approach aimed at seeking optimal lane-changing recommendations within a semi-autonomous driving environment to enhance a single vehicle's travel efficiency. The problem is framed within a Markov decision process setting and is addressed through an adherence-aware deep Q network, which takes into account the partial compliance of human drivers with the recommended actions. This approach is evaluated within CARLA's driving environment under realistic scenarios.

SYApr 3, 2025
Route Recommendations for Traffic Management Under Learned Partial Driver Compliance

Heeseung Bang, Jung-Hoon Cho, Cathy Wu et al. · mit

In this paper, we aim to mitigate congestion in traffic management systems by guiding travelers along system-optimal (SO) routes. However, we recognize that most theoretical approaches assume perfect driver compliance, which often does not reflect reality, as drivers tend to deviate from recommendations to fulfill their personal objectives. Therefore, we propose a route recommendation framework that explicitly learns partial driver compliance and optimizes traffic flow under realistic adherence. We first compute an SO edge flow through flow optimization techniques. Next, we train a compliance model based on historical driver decisions to capture individual responses to our recommendations. Finally, we formulate a stochastic optimization problem that minimizes the gap between the target SO flow and the realized flow under conditions of imperfect adherence. Our simulations conducted on a grid network reveal that our approach significantly reduces travel time compared to baseline strategies, demonstrating the practical advantage of incorporating learned compliance into traffic management.

AIApr 1, 2025
Off-Policy Evaluation for Sequential Persuasion Process with Unobserved Confounding

Nishanth Venkatesh S., Heeseung Bang, Andreas A. Malikopoulos

In this paper, we expand the Bayesian persuasion framework to account for unobserved confounding variables in sender-receiver interactions. While traditional models assume that belief updates follow Bayesian principles, real-world scenarios often involve hidden variables that impact the receiver's belief formation and decision-making. We conceptualize this as a sequential decision-making problem, where the sender and receiver interact over multiple rounds. In each round, the sender communicates with the receiver, who also interacts with the environment. Crucially, the receiver's belief update is affected by an unobserved confounding variable. By reformulating this scenario as a Partially Observable Markov Decision Process (POMDP), we capture the sender's incomplete information regarding both the dynamics of the receiver's beliefs and the unobserved confounder. We prove that finding an optimal observation-based policy in this POMDP is equivalent to solving for an optimal signaling strategy in the original persuasion framework. Furthermore, we demonstrate how this reformulation facilitates the application of proximal learning for off-policy evaluation in the persuasion process. This advancement enables the sender to evaluate alternative signaling strategies using only observational data from a behavioral policy, thus eliminating the necessity for costly new experiments.

SYApr 1
A Functional Learning Approach for Team-Optimal Traffic Coordination

Weihao Sun, Gehui Xu, Alessio Moreschini et al.

In this paper, we develop a kernel-based policy iteration functional learning framework for computing team-optimal strategies in traffic coordination problems. We consider a multi-agent discrete-time linear system with a cost function that combines quadratic regulation terms and nonlinear safety penalties. Building on the Hilbert space formulation of offline receding-horizon policy iteration, we seek approximate solutions within a reproducing kernel Hilbert space, where the policy improvement step is implemented via a discrete Fréchet derivative. We further study the model-free receding-horizon scenario, where the system dynamics are estimated using recursive least squares, followed by updating the policy using rolling online data. The proposed method is tested in signal-free intersection scenarios via both model-based and model-free simulations and validated in SUMO.

LGOct 22, 2025
A Communication-Efficient Decentralized Actor-Critic Algorithm

Xiaoxing Ren, Nicola Bastianello, Thomas Parisini et al.

In this paper, we study the problem of reinforcement learning in multi-agent systems where communication among agents is limited. We develop a decentralized actor-critic learning framework in which each agent performs several local updates of its policy and value function, where the latter is approximated by a multi-layer neural network, before exchanging information with its neighbors. This local training strategy substantially reduces the communication burden while maintaining coordination across the network. We establish finite-time convergence analysis for the algorithm under Markov-sampling. Specifically, to attain the $\varepsilon$-accurate stationary point, the sample complexity is of order $\mathcal{O}(\varepsilon^{-3})$ and the communication complexity is of order $\mathcal{O}(\varepsilon^{-1}τ^{-1})$, where tau denotes the number of local training steps. We also show how the final error bound depends on the neural network's approximation quality. Numerical experiments in a cooperative control setting illustrate and validate the theoretical findings.

ROSep 5, 2025
Microrobot Vascular Parkour: Analytic Geometry-based Path Planning with Real-time Dynamic Obstacle Avoidance

Yanda Yang, Max Sokolich, Fatma Ceren Kirmizitas et al.

Autonomous microrobots in blood vessels could enable minimally invasive therapies, but navigation is challenged by dense, moving obstacles. We propose a real-time path planning framework that couples an analytic geometry global planner (AGP) with two reactive local escape controllers, one based on rules and one based on reinforcement learning, to handle sudden moving obstacles. Using real-time imaging, the system estimates the positions of the microrobot, obstacles, and targets and computes collision-free motions. In simulation, AGP yields shorter paths and faster planning than weighted A* (WA*), particle swarm optimization (PSO), and rapidly exploring random trees (RRT), while maintaining feasibility and determinism. We extend AGP from 2D to 3D without loss of speed. In both simulations and experiments, the combined global planner and local controllers reliably avoid moving obstacles and reach targets. The average planning time is 40 ms per frame, compatible with 25 fps image acquisition and real-time closed-loop control. These results advance autonomous microrobot navigation and targeted drug delivery in vascular environments.

RONov 5, 2021
A First-Order Approach to Model Simultaneous Control of Multiple Microrobots

Logan E. Beaver, Sambeeta Das, Andreas A. Malikopoulos

The control of swarm systems is relatively well understood for simple robotic platforms at the macro scale. However, there are still several unanswered questions about how similar results can be achieved for microrobots. In this paper, we propose a modeling framework based on a dynamic model of magnetized self-propelling Janus microrobots under a global magnetic field. We verify our model experimentally and provide methods that can aim at accurately describing the behavior of microrobots while modeling their simultaneous control. The model can be generalized to other microrobotic platforms in low Reynolds number environments.

ROSep 13, 2021
Constraint-Driven Optimal Control of Multi-Agent Systems: A Highway Platooning Case Study

Logan E. Beaver, Andreas A. Malikopoulos

Platooning has been exploited as a method for vehicles to minimize energy consumption. In this article, we present a constraint-driven optimal control framework that yields emergent platooning behavior for connected and automated vehicles operating in an open transportation system. Our approach combines recent insights in constraint-driven optimal control with the physical aerodynamic interactions between vehicles in a highway setting. The result is a set of equations that describes when platooning is an appropriate strategy, as well as a descriptive optimal control law that yields emergent platooning behavior. Finally, we demonstrate these properties in simulation.

ROSep 7, 2021
A Digital Smart City for Emerging Mobility Systems

Raymond M. Zayas, Logan E. Beaver, Behdad Chalaki et al.

The increasing demand for emerging mobility systems with connected and automated vehicles has imposed the necessity for quality testing environments to support their development. In this paper, we introduce a Unity-based virtual simulation environment for emerging mobility systems, called the Information and Decision Science Lab's Scaled Smart Digital City (IDS 3D City), intended to operate alongside its physical peer and its established control framework. By utilizing the Robot Operation System, AirSim, and Unity, we constructed a simulation environment capable of iteratively designing experiments significantly faster than it is possible in a physical testbed. This environment provides an intermediate step to validate the effectiveness of our control algorithms prior to their implementation in the physical testbed. The IDS 3D City also enables us to demonstrate that our control algorithms work independently of the underlying vehicle dynamics, as the vehicle dynamics introduced by AirSim operate at a different scale than our scaled smart city. Finally, we demonstrate the behavior of our digital environment by performing an experiment in both the virtual and physical environments and comparing their outputs.

ROMar 4, 2021
Optimal Control of Differentially Flat Systems is Surprisingly Easy

Logan E. Beaver, Andreas A. Malikopoulos

As we move to increasingly complex cyber-physical systems (CPS), new approaches are needed to plan efficient state trajectories in real-time. In this paper, we propose an approach to significantly reduce the complexity of solving optimal control problems for a class of CPS with nonlinear dynamics. We exploit the property of differential flatness to simplify the Euler-Lagrange equations that arise during optimization, and this simplification eliminates the numerical instabilities that plague optimal control in general. We also present an explicit differential equation that describes the evolution of the optimal state trajectory, and we extend our results to consider both the unconstrained and constrained cases. Furthermore, we demonstrate the performance of our approach by generating the optimal trajectory for a planar manipulator with two revolute joints. We show in simulation that our approach is able to generate the constrained optimal trajectory in $4.5$ ms while respecting workspace constraints and switching between a `left' and `right' bend in the elbow joint.

OCNov 5, 2020
A Hysteretic Q-learning Coordination Framework for Emerging Mobility Systems in Smart Cities

Behdad Chalaki, Andreas A. Malikopoulos

Connected and automated vehicles (CAVs) can alleviate traffic congestion, air pollution, and improve safety. In this paper, we provide a decentralized coordination framework for CAVs at a signal-free intersection to minimize travel time and improve fuel efficiency. We employ a simple yet powerful reinforcement learning approach, an off-policy temporal difference learning called Q-learning, enhanced with a coordination mechanism to address this problem. Then, we integrate a first-in-first-out queuing policy to improve the performance of our system. We demonstrate the efficacy of our proposed approach through simulation and comparison with the classical optimal control method based on Pontryagin's minimum principle.

OCJan 30, 2020
Experimental Validation of a Real-Time Optimal Controller for Coordination of CAVs in a Multi-Lane Roundabout

Behdad Chalaki, Logan E. Beaver, Andreas A. Malikopoulos

Roundabouts in conjunction with other traffic scenarios, e.g., intersections, merging roadways, speed reduction zones, can induce congestion in a transportation network due to driver responses to various disturbances. Research efforts have shown that smoothing traffic flow and eliminating stop-and-go driving can both improve fuel efficiency of the vehicles and the throughput of a roundabout. In this paper, we validate an optimal control framework developed earlier in a multi-lane roundabout scenario using the University of Delaware's scaled smart city (UDSSC). We first provide conditions where the solution is optimal. Then, we demonstrate the feasibility of the solution using experiments at UDSSC, and show that the optimal solution completely eliminates stop-and-go driving while preserving safety.