Scalable Perception-Action-Communication Loops with Convolutional and Graph Neural Networks
This addresses the problem of scalable and decentralized control for multi-agent systems, such as robotic swarms, but is incremental as it builds on existing CNN and GNN methods.
The paper tackles multi-agent decentralized control by introducing a perception-action-communication loop using Vision-based Graph Aggregation and Inference (VGAI), which combines CNN and GNN for visual perception and local communication, achieving performance comparable to or better than other decentralized controllers in a flocking application without precise location data.
In this paper, we present a perception-action-communication loop design using Vision-based Graph Aggregation and Inference (VGAI). This multi-agent decentralized learning-to-control framework maps raw visual observations to agent actions, aided by local communication among neighboring agents. Our framework is implemented by a cascade of a convolutional and a graph neural network (CNN / GNN), addressing agent-level visual perception and feature learning, as well as swarm-level communication, local information aggregation and agent action inference, respectively. By jointly training the CNN and GNN, image features and communication messages are learned in conjunction to better address the specific task. We use imitation learning to train the VGAI controller in an offline phase, relying on a centralized expert controller. This results in a learned VGAI controller that can be deployed in a distributed manner for online execution. Additionally, the controller exhibits good scaling properties, with training in smaller teams and application in larger teams. Through a multi-agent flocking application, we demonstrate that VGAI yields performance comparable to or better than other decentralized controllers, using only the visual input modality and without accessing precise location or motion state information.