ROMar 26
Can Vision Foundation Models Navigate? Zero-Shot Real-World Evaluation and Lessons LearnedMaeva Guerrier, Karthik Soma, Jana Pavlasek et al.
Visual Navigation Models (VNMs) promise generalizable, robot navigation by learning from large-scale visual demonstrations. Despite growing real-world deployment, existing evaluations rely almost exclusively on success rate, whether the robot reaches its goal, which conceals trajectory quality, collision behavior, and robustness to environmental change. We present a real-world evaluation of five state-of-the-art VNMs (GNM, ViNT, NoMaD, NaviBridger, and CrossFormer) across two robot platforms and five environments spanning indoor and outdoor settings. Beyond success rate, we combine path-based metrics with vision-based goal-recognition scores and assess robustness through controlled image perturbations (motion blur, sunflare). Our analysis uncovers three systematic limitations: (a) even architecturally sophisticated diffusion and transformer-based models exhibit frequent collisions, indicating limited geometric understanding; (b) models fail to discriminate between different locations that are perceptually similar, however some semantics differences are present, causing goal prediction errors in repetitive environments; and (c) performance degrades under distribution shift. We will publicly release our evaluation codebase and dataset to facilitate reproducible benchmarking of VNMs.
ROApr 29
Split over $n$ resource sharing problem: Are fewer capable agents better than many simpler ones?Karthik Soma, Mohamed S. Talamali, Genki Miyauchi et al.
In multi-agent systems, should limited resources be concentrated into a few capable agents or distributed among many simpler ones? This work formulates the split over $n$ resource sharing problem where a group of $n$ agents equally shares a common resource (e.g., monetary budget, computational resources, physical size). We present a case study in multi-agent coverage where the area of the disk-shaped footprint of agents scales as $1/n$. A formal analysis reveals that the initial coverage rate grows with $n$. However, if the speed of agents decreases proportionally with their radii, groups of all sizes perform equally well, whereas if it decreases proportionally with their footprints, a single agent performs best. We also present computer simulations in which resource splitting increases the failure rates of individual agents. The models and findings help identify optimal distributiveness levels and inform the design of multi-agent systems under resource constraints.
MASep 23, 2025
The Heterogeneous Multi-Agent ChallengeCharles Dansereau, Junior-Samuel Lopez-Yepez, Karthik Soma et al.
Multi-Agent Reinforcement Learning (MARL) is a growing research area which gained significant traction in recent years, extending Deep RL applications to a much wider range of problems. A particularly challenging class of problems in this domain is Heterogeneous Multi-Agent Reinforcement Learning (HeMARL), where agents with different sensors, resources, or capabilities must cooperate based on local information. The large number of real-world situations involving heterogeneous agents makes it an attractive research area, yet underexplored, as most MARL research focuses on homogeneous agents (e.g., a swarm of identical robots). In MARL and single-agent RL, standardized environments such as ALE and SMAC have allowed to establish recognized benchmarks to measure progress. However, there is a clear lack of such standardized testbed for cooperative HeMARL. As a result, new research in this field often uses simple environments, where most algorithms perform near optimally, or uses weakly heterogeneous MARL environments.
ROMay 24, 2025
Guided by Guardrails: Control Barrier Functions as Safety Instructors for Robotic LearningMaeva Guerrier, Karthik Soma, Hassan Fouad et al.
Safety stands as the primary obstacle preventing the widespread adoption of learning-based robotic systems in our daily lives. While reinforcement learning (RL) shows promise as an effective robot learning paradigm, conventional RL frameworks often model safety by using single scalar negative rewards with immediate episode termination, failing to capture the temporal consequences of unsafe actions (e.g., sustained collision damage). In this work, we introduce a novel approach that simulates these temporal effects by applying continuous negative rewards without episode termination. Our experiments reveal that standard RL methods struggle with this model, as the accumulated negative values in unsafe zones create learning barriers. To address this challenge, we demonstrate how Control Barrier Functions (CBFs), with their proven safety guarantees, effectively help robots avoid catastrophic regions while enhancing learning outcomes. We present three CBF-based approaches, each integrating traditional RL methods with Control Barrier Functions, guiding the agent to learn safe behavior. Our empirical analysis, conducted in both simulated environments and real-world settings using a four-wheel differential drive robot, explores the possibilities of employing these approaches for safe robotic learning.
MAOct 23, 2024
The Hive Mind is a Single Reinforcement Learning AgentKarthik Soma, Yann Bouteiller, Heiko Hamann et al.
Decision-making is an essential attribute of any intelligent agent or group. Natural systems are known to converge to optimal strategies through at least two distinct mechanisms: collective decision-making via imitation of others, and individual trial-and-error. This paper establishes an equivalence between these two paradigms by drawing from the well-established collective decision-making model of nest-hunting in swarms of honey bees. We show that the emergent distributed cognition (sometimes referred to as the $\textit{hive mind}$) arising from individual bees following simple, local imitation-based rules is that of a single online reinforcement learning (RL) agent interacting with many parallel environments. The update rule through which this macro-agent learns is a bandit algorithm that we coin $\textit{Maynard-Cross Learning}$. Our analysis implies that a group of cognition-limited organisms can be equivalent to a more complex, reinforcement-enabled entity, substantiating the idea that group-level intelligence may explain how seemingly simple and blind individual behaviors are selected in nature. From a biological perspective, this analysis suggests how such imitation strategies evolved: they constitute a scalable form of reinforcement learning at the group level, aligning with theories of kin and group selection. Beyond biology, the framework offers new tools for analyzing economic and social systems where individuals imitate successful strategies, effectively participating in a collective learning process. In swarm intelligence, our findings will inform the design of scalable collective systems in artificial domains, enabling RL-inspired mechanisms for coordination and adaptability at scale.
LGOct 22, 2024
Evolution of Societies via Reinforcement LearningYann Bouteiller, Karthik Soma, Giovanni Beltrame
The universe involves many independent co-learning agents as an ever-evolving part of our observed environment. Yet, in practice, Multi-Agent Reinforcement Learning (MARL) applications are typically constrained to small, homogeneous populations and remain computationally intensive. We propose a methodology that enables simulating populations of Reinforcement Learning agents at evolutionary scale. More specifically, we derive a fast, parallelizable implementation of Policy Gradient (PG) and Opponent-Learning Awareness (LOLA), tailored for evolutionary simulations where agents undergo random pairwise interactions in stateless normal-form games. We demonstrate our approach by simulating the evolution of very large populations made of heterogeneous co-learning agents, under both naive and advanced learning strategies. In our experiments, 200,000 PG or LOLA agents evolve in the classic games of Hawk-Dove, Stag-Hunt, and Rock-Paper-Scissors. Each game provides distinct insights into how populations evolve under both naive and advanced MARL rules, including compelling ways in which Opponent-Learning Awareness affects social evolution.
ROJun 6, 2021
Collective transport via sequential cagingVivek Shankar Vardharajan, Karthik Soma, Giovanni Beltrame
We propose a decentralized algorithm to collaboratively transport arbitrarily shaped objects using a swarm of robots. Our approach starts with a task allocation phase that sequentially distributes locations around the object to be transported starting from a seed robot that makes first contact with the object. Our approach does not require previous knowledge of the shape of the object to ensure caging. To push the object to a goal location, we estimate the robots required to apply force on the object based on the angular difference between the target and the object. During transport, the robots follow a sequence of intermediate goal locations specifying the required pose of the object at that location. We evaluate our approach in a physics-based simulator with up to 100 robots, using three generic paths. Experiments using a group of KheperaIV robots demonstrate the effectiveness of our approach in a real setting. Keywords: Collaborative transport, Task Allocation, Caging, Robot Swarms