Jonathon M. Smereka

RO
11papers
100citations
Novelty49%
AI Score45

11 Papers

MAJul 17, 2022
Task Allocation with Load Management in Multi-Agent Teams

Haochen Wu, Amin Ghadami, Alparslan Emrah Bayrak et al.

In operations of multi-agent teams ranging from homogeneous robot swarms to heterogeneous human-autonomy teams, unexpected events might occur. While efficiency of operation for multi-agent task allocation problems is the primary objective, it is essential that the decision-making framework is intelligent enough to manage unexpected task load with limited resources. Otherwise, operation effectiveness would drastically plummet with overloaded agents facing unforeseen risks. In this work, we present a decision-making framework for multi-agent teams to learn task allocation with the consideration of load management through decentralized reinforcement learning, where idling is encouraged and unnecessary resource usage is avoided. We illustrate the effect of load management on team performance and explore agent behaviors in example scenarios. Furthermore, a measure of agent importance in collaboration is developed to infer team resilience when facing handling potential overload situations.

75.6ROMar 16
MA-VLCM: A Vision Language Critic Model for Value Estimation of Policies in Multi-Agent Team Settings

Shahil Shaik, Aditya Parameshwaran, Anshul Nayak et al.

Multi-agent reinforcement learning (MARL) commonly relies on a centralized critic to estimate the value function. However, learning such a critic from scratch is highly sample-inefficient and often lacks generalization across environments. At the same time, large vision-language-action models (VLAs) trained on internet-scale data exhibit strong multimodal reasoning and zero-shot generalization capabilities, yet directly deploying them for robotic execution remains computationally prohibitive, particularly in heterogeneous multi-robot systems with diverse embodiments and resource constraints. To address these challenges, we propose Multi-Agent Vision-Language-Critic Models (MA-VLCM), a framework that replaces the learned centralized critic in MARL with a pretrained vision-language model fine-tuned to evaluate multi-agent behavior. MA-VLCM acts as a centralized critic conditioned on natural language task descriptions, visual trajectory observations, and structured multi-agent state information. By eliminating critic learning during policy optimization, our approach significantly improves sample efficiency while producing compact execution policies suitable for deployment on resource-constrained robots. Results show good zero-shot return estimation on models with differing VLM backbones on in-distribution and out-of-distribution scenarios in multi-agent team settings

LGJan 22
Multi-Agent Deep Reinforcement Learning Under Constrained Communications

Shahil Shaik, Jonathon M. Smereka, Yue Wang

Centralized training with decentralized execution (CTDE) has been the dominant paradigm in multi-agent reinforcement learning (MARL), but its reliance on global state information during training introduces scalability, robustness, and generalization bottlenecks. Moreover, in practical scenarios such as adding/dropping teammates or facing environment dynamics that differ from the training, CTDE methods can be brittle and costly to retrain, whereas distributed approaches allow agents to adapt using only local information and peer-to-peer communication. We present a distributed MARL framework that removes the need for centralized critics or global information. Firstly, we develop a novel Distributed Graph Attention Network (D-GAT) that performs global state inference through multi-hop communication, where agents integrate neighbor features via input-dependent attention weights in a fully distributed manner. Leveraging D-GAT, we develop the distributed graph-attention MAPPO (DG-MAPPO) -- a distributed MARL framework where agents optimize local policies and value functions using local observations, multi-hop communication, and shared/averaged rewards. Empirical evaluation on the StarCraftII Multi-Agent Challenge, Google Research Football, and Multi-Agent Mujoco demonstrates that our method consistently outperforms strong CTDE baselines, achieving superior coordination across a wide range of cooperative tasks with both homogeneous and heterogeneous teams. Our distributed MARL framework provides a principled and scalable solution for robust collaboration, eliminating the need for centralized training or global observability. To the best of our knowledge, DG-MAPPO appears to be the first to fully eliminate reliance on privileged centralized information, enabling agents to learn and act solely through peer-to-peer communication.

LGJul 23, 2025
Generalized Advantage Estimation for Distributional Policy Gradients

Shahil Shaik, Jonathon M. Smereka, Yue Wang

Generalized Advantage Estimation (GAE) has been used to mitigate the computational complexity of reinforcement learning (RL) by employing an exponentially weighted estimation of the advantage function to reduce the variance in policy gradient estimates. Despite its effectiveness, GAE is not designed to handle value distributions integral to distributional RL, which can capture the inherent stochasticity in systems and is hence more robust to system noises. To address this gap, we propose a novel approach that utilizes the optimal transport theory to introduce a Wasserstein-like directional metric, which measures both the distance and the directional discrepancies between probability distributions. Using the exponentially weighted estimation, we leverage this Wasserstein-like directional metric to derive distributional GAE (DGAE). Similar to traditional GAE, our proposed DGAE provides a low-variance advantage estimate with controlled bias, making it well-suited for policy gradient algorithms that rely on advantage estimation for policy updates. We integrated DGAE into three different policy gradient methods. Algorithms were evaluated across various OpenAI Gym environments and compared with the baselines with traditional GAE to assess the performance.

ROJan 6, 2022
Trust-based Symbolic Motion Planning for Multi-robot Bounding Overwatch

Huanfei Zheng, Jonathon M. Smereka, Dariusz Mikulski et al.

Multi-robot bounding overwatch requires timely coordination of robot team members. Symbolic motion planning (SMP) can provide provably correct solutions for robot motion planning with high-level temporal logic task requirements. This paper aims to develop a framework for safe and reliable SMP of multi-robot systems (MRS) to satisfy complex bounding overwatch tasks constrained by temporal logics. A decentralized SMP framework is first presented, which guarantees both correctness and parallel execution of the complex bounding overwatch tasks by the MRS. A computational trust model is then constructed by referring to the traversability and line of sight of robots in the terrain. The trust model predicts the trustworthiness of each robot team's potential behavior in executing a task plan. The most trustworthy task and motion plan is explored with a Dijkstra searching strategy to guarantee the reliability of MRS bounding overwatch. A robot simulation is implemented in ROS Gazebo to demonstrate the effectiveness of the proposed framework.

ROJan 6, 2022
Bayesian Optimization Based Trustworthiness Model for Multi-robot Bounding Overwatch

Huanfei Zheng, Jonathon M. Smereka, Dariusz Mikluski et al.

In multi-robot system (MRS) bounding overwatch, it is crucial to determine which point to choose for overwatch at each step and whether the robots' positions are trustworthy so that the overwatch can be performed effectively. In this paper, we develop a Bayesian optimization based computational trustworthiness model (CTM) for the MRS to select overwatch points. The CTM can provide real-time trustworthiness evaluation for the MRS on the overwatch points by referring to the robots' situational awareness information, such as traversability and line of sight. The evaluation can quantify each robot's trustworthiness in protecting its robot team members during the bounding overwatch. The trustworthiness evaluation can generate a dynamic cost map for each robot in the workspace and help obtain the most trustworthy bounding overwatch path. Our proposed Bayesian based CTM and motion planning can reduce the number of explorations for the workspace in data collection and improve the CTM learning efficiency. It also enables the MRS to deal with the dynamic and uncertain environments for the multi-robot bounding overwatch task. A robot simulation is implemented in ROS Gazebo to demonstrate the effectiveness of the proposed framework.

AIAug 7, 2020
Efficient algorithms for electric vehicles' min-max routing problem

Seyed Sajjad Fazeli, Saravanan Venkatachalam, Jonathon M. Smereka

An increase in greenhouse gases emission from the transportation sector has led companies and the government to elevate and support the production of electric vehicles (EV). With recent developments in urbanization and e-commerce, transportation companies are replacing their conventional fleet with EVs to strengthen the efforts for sustainable and environment-friendly operations. However, deploying a fleet of EVs asks for efficient routing and recharging strategies to alleviate their limited range and mitigate the battery degradation rate. In this work, a fleet of electric vehicles is considered for transportation and logistic capabilities with limited battery capacity and scarce charging station availability. We introduce a min-max electric vehicle routing problem (MEVRP) where the maximum distance traveled by any EV is minimized while considering charging stations for recharging. We propose an efficient branch and cut framework and a three-phase hybrid heuristic algorithm that can efficiently solve a variety of instances. Extensive computational results and sensitivity analyses are performed to corroborate the efficiency of the proposed approach, both quantitatively and qualitatively.

OCOct 8, 2019
Two-stage stochastic programming approach for path planning problems under travel time and availability uncertainties

Saravanan Venkatachalam, Manish Bansal, Jonathon M. Smereka et al.

Significant advances in sensing, robotics, and wireless networks have enabled the collaborative utilization of autonomous aerial, ground and underwater vehicles for various applications. However, to successfully harness the benefits of these unmanned ground vehicles (UGVs) in homeland security operations, it is critical to efficiently solve UGV path planning problem which lies at the heart of these operations. Furthermore, in the real-world applications of UGVs, these operations encounter uncertainties such as incomplete information about the target sites, travel times, and the availability of vehicles, sensors, and fuel. This research paper focuses on developing algebraic-based-modeling framework to enable the successful deployment of a team of vehicles while addressing uncertainties in the distance traveled and the availability of UGVs for the mission.

CRSep 24, 2018
Security and Performance Considerations in ROS 2: A Balancing Act

Jongkil Kim, Jonathon M. Smereka, Calvin Cheung et al.

Robot Operating System (ROS) 2 is a ground-up re-design of ROS 1 to support performance critical cyber-physical systems (CPSs) using the Data Distribution Service (DDS) middleware. Accordingly, the security of ROS 2 is highly reliant on the security of its DDS communication protocol. However, finding a balance between the performance and security is non-trivial task. Inappropriate security implementations may cause not only significant loss on performance of the system, but also security failures in the system. In this paper, we provide an analysis of the DDS security protocol as well as an overview on how to find the balance between performance and security. To accomplish this, we evaluate the latency and throughput of the communication protocols of ROS 2 in both wired and wireless networks, and measure the efficiency loss caused by the enabling of security protocols such as Virtual Private Network (VPN) and DDS security protocol in ROS 2 in both network setups. The result can be directly used by robotics developers to find the optimal and balanced settings of ROS 2 applications. Additionally, we analyzed the security specification of DDS using existing security standards and tested the implementation of the DDS protocol by performing static analysis. The results of this work can be used to enhance the security of ROS 2.

ROSep 10, 2018
Open Problems in Robotic Anomaly Detection

Ritwik Gupta, Zachary T. Kurtz, Sebastian Scherer et al.

Failures in robotics can have disastrous consequences that worsen rapidly over time. This, the ability to rely on robotic systems, depends on our ability to monitor them and intercede when necessary, manually or autonomously. Prior work in this area surveys intrusion detection and security challenges in robotics, but a discussion of the more general anomaly detection problems is lacking. As such, we provide a brief insight-focused discussion and frameworks of thought on some compelling open problems with anomaly detection in robotic systems. Namely, we discuss non-malicious faults, invalid data, intentional anomalous behavior, hierarchical anomaly detection, distribution of computation, and anomaly correction on the fly. We demonstrate the need for additional work in these areas by providing a case study which examines the limitations of implementing a basic anomaly detection (AD) system in the Robot Operating System (ROS) 2 middleware. Showing that if even supporting a basic system is a significant hurdle, the path to more complex and advanced AD systems is even more problematic. We discuss these ROS 2 platform limitations to support solutions in robotic anomaly detection and provide recommendations to address the issues discovered.

ROJul 25, 2018
Tree Search Techniques for Minimizing Detectability and Maximizing Visibility

Zhongshun Zhang, Yoonchang Sung, Lifeng Zhou et al.

We introduce and study the problem of planning a trajectory for an agent to carry out a scouting mission while avoiding being detected by an adversarial guard. This introduces an adversarial version of classical visibility-based planning problems such as the Watchman Route Problem. The agent receives a positive reward for increasing its visibility and a negative penalty when it is detected by the guard. The objective is to find a finite-horizon path for the agent that balances the trade-off maximizing visibility and minimizing detectability. We model this problem as a sequential two-player zero-sum discrete game. A minimax tree search can give the optimal policy for the agent but requires an exponential-time computation and space. We propose several pruning techniques to reduce the computational cost while still preserving optimality guarantees. Simulation results show that the proposed strategy prunes approximately three orders of magnitude nodes as compared to the brute-force strategy.