31.6ROMar 9
DeReCo: Decoupling Representation and Coordination Learning for Object-Adaptive Decentralized Multi-Robot Cooperative TransportKazuki Shibata, Ryosuke Sota, Shandil Dhiresh Bosch et al.
Generalizing decentralized multi-robot cooperative transport across objects with diverse shapes and physical properties remains a fundamental challenge. Under decentralized execution, two key challenges arise: object-dependent representation learning under partial observability and coordination learning in multi-agent reinforcement learning (MARL) under non-stationarity. A typical approach jointly optimizes object-dependent representations and coordinated policies in an end-to-end manner while randomizing object shapes and physical properties during training. However, this joint optimization tightly couples representation and coordination learning, introducing bidirectional interference: inaccurate representations under partial observability destabilize coordination learning, while non-stationarity in MARL further degrades representation learning, resulting in sample-inefficient training. To address this structural coupling, we propose DeReCo, a novel MARL framework that decouples representation and coordination learning for object-adaptive multi-robot cooperative transport, improving sample efficiency and generalization across objects and transport scenarios. DeReCo adopts a three-stage training strategy: (1) centralized coordination learning with privileged object information, (2) reconstruction of object-dependent representations from local observations, and (3) progressive removal of privileged information for decentralized execution. This decoupling mitigates interference between representation and coordination learning and enables stable and sample-efficient training. Experimental results show that DeReCo outperforms baselines in simulation on three training objects, generalizes to six unseen objects with varying masses and friction coefficients, and achieves superior performance on two unseen objects in real-robot experiments.
ROMar 15, 2025
ICCO: Learning an Instruction-conditioned Coordinator for Language-guided Task-aligned Multi-robot ControlYoshiki Yano, Kazuki Shibata, Maarten Kokshoorn et al.
Recent advances in Large Language Models (LLMs) have permitted the development of language-guided multi-robot systems, which allow robots to execute tasks based on natural language instructions. However, achieving effective coordination in distributed multi-agent environments remains challenging due to (1) misalignment between instructions and task requirements and (2) inconsistency in robot behaviors when they independently interpret ambiguous instructions. To address these challenges, we propose Instruction-Conditioned Coordinator (ICCO), a Multi-Agent Reinforcement Learning (MARL) framework designed to enhance coordination in language-guided multi-robot systems. ICCO consists of a Coordinator agent and multiple Local Agents, where the Coordinator generates Task-Aligned and Consistent Instructions (TACI) by integrating language instructions with environmental states, ensuring task alignment and behavioral consistency. The Coordinator and Local Agents are jointly trained to optimize a reward function that balances task efficiency and instruction following. A Consistency Enhancement Term is added to the learning objective to maximize mutual information between instructions and robot behaviors, further improving coordination. Simulation and real-world experiments validate the effectiveness of ICCO in achieving language-guided task-aligned multi-robot control. The demonstration can be found at https://yanoyoshiki.github.io/ICCO/.
ROApr 28, 2021
Development of global optimal coverage control using multiple aerial robotsKazuki Shibata, Tatsuya Miyano, Tomohiko Jimbo
Coverage control has been widely used for constructing mobile sensor network such as for environmental monitoring, and one of the most commonly used methods is the Lloyd algorithm based on Voronoi partitions. However, when this method is used, the result sometimes converges to a local optimum. To overcome this problem, game theoretic coverage control has been proposed and found to be capable of stochastically deriving the optimal deployment. From a practical point of view, however, it is necessary to make the result converge to the global optimum deterministically. In this paper, we propose a global optimal coverage control along with collision avoidance in continuous space that ensures multiple sensors can deterministically and smoothly move to the global optimal deployment. This approach consists of a cut-in algorithm based on neighborhood importance of measurement and a modified potential method for collision avoidance. The effectiveness of the proposed algorithm has been confirmed through numerous simulations and some experiments using multiple aerial robots.
ROApr 21, 2021
Robust shape estimation with false-positive contact detectionKazuki Shibata, Tatsuya Miyano, Tomohiko Jimbo et al.
We propose a means of omni-directional contact detection using accelerometers instead of tactile sensors for object shape estimation using touch. Unlike tactile sensors, our contact-based detection method tends to induce a degree of uncertainty with false-positive contact data because the sensors may react not only to actual contact but also to the unstable behavior of the robot. Therefore, it is crucial to consider a robust shape estimation method capable of handling such false-positive contact data. To realize this, we introduce the concept of heteroscedasticity into the contact data and propose a robust shape estimation algorithm based on Gaussian process implicit surfaces (GPIS). We confirmed that our algorithm not only reduces shape estimation errors caused by false-positive contact data but also distinguishes false-positive contact data more clearly than the GPIS through simulations and actual experiments using a quadcopter.
LGMar 29, 2021
Deep reinforcement learning of event-triggered communication and control for multi-agent cooperative transportKazuki Shibata, Tomohiko Jimbo, Takamitsu Matsubara
In this paper, we explore a multi-agent reinforcement learning approach to address the design problem of communication and control strategies for multi-agent cooperative transport. Typical end-to-end deep neural network policies may be insufficient for covering communication and control; these methods cannot decide the timing of communication and can only work with fixed-rate communications. Therefore, our framework exploits event-triggered architecture, namely, a feedback controller that computes the communication input and a triggering mechanism that determines when the input has to be updated again. Such event-triggered control policies are efficiently optimized using a multi-agent deep deterministic policy gradient. We confirmed that our approach could balance the transport performance and communication savings through numerical simulations.