Deep Neural Network-based Cooperative Visual Tracking through Multiple Micro Aerial Vehicles
This work addresses the challenge of outdoor multi-camera pose capture for humans and animals using MAVs, offering a solution for applications like surveillance or wildlife monitoring, though it is incremental in improving DNN-based tracking through cooperation.
The paper tackles the problem of real-time, on-board person tracking using micro aerial vehicles (MAVs) with deep neural networks, which struggle with small or distant objects. By enabling cooperative tracking among multiple MAVs to predict poses and selectively process high-resolution images, the method achieves continuous tracking of distant humans with high accuracy and speed, as demonstrated in real robot experiments.
Multi-camera full-body pose capture of humans and animals in outdoor environments is a highly challenging problem. Our approach to it involves a team of cooperating micro aerial vehicles (MAVs) with on-board cameras only. The key enabling-aspect of our approach is the on-board person detection and tracking method. Recent state-of-the-art methods based on deep neural networks (DNN) are highly promising in this context. However, real time DNNs are severely constrained in input data dimensions, in contrast to available camera resolutions. Therefore, DNNs often fail at objects with small scale or far away from the camera, which are typical characteristics of a scenario with aerial robots. Thus, the core problem addressed in this paper is how to achieve on-board, real-time, continuous and accurate vision-based detections using DNNs for visual person tracking through MAVs. Our solution leverages cooperation among multiple MAVs. First, each MAV fuses its own detections with those obtained by other MAVs to perform cooperative visual tracking. This allows for predicting future poses of the tracked person, which are used to selectively process only the relevant regions of future images, even at high resolutions. Consequently, using our DNN-based detector we are able to continuously track even distant humans with high accuracy and speed. We demonstrate the efficiency of our approach through real robot experiments involving two aerial robots tracking a person, while maintaining an active perception-driven formation. Our solution runs fully on-board our MAV's CPU and GPU, with no remote processing. ROS-based source code is provided for the benefit of the community.