On-Device Self-Supervised Learning of Low-Latency Monocular Depth from Only Events
This work addresses the problem of enabling efficient online learning for agile robots using event cameras, which is incremental by improving time and memory efficiency in an existing pipeline.
The paper tackles the challenge of achieving real-time, on-device self-supervised learning for monocular depth estimation from event cameras on resource-restricted robots like drones, resulting in more accurate depth estimates and improved obstacle avoidance behavior compared to pre-training alone, with benchmarking showing state-of-the-art performance among self-supervised approaches.
Event cameras provide low-latency perception for only milliwatts of power. This makes them highly suitable for resource-restricted, agile robots such as small flying drones. Self-supervised learning based on contrast maximization holds great potential for event-based robot vision, as it foregoes the need for high-frequency ground truth and allows for online learning in the robot's operational environment. However, online, on-board learning raises the major challenge of achieving sufficient computational efficiency for real-time learning, while maintaining competitive visual perception performance. In this work, we improve the time and memory efficiency of the contrast maximization pipeline, making on-device learning of low-latency monocular depth possible. We demonstrate that online learning on board a small drone yields more accurate depth estimates and more successful obstacle avoidance behavior compared to only pre-training. Benchmarking experiments show that the proposed pipeline is not only efficient, but also achieves state-of-the-art depth estimation performance among self-supervised approaches. Our work taps into the unused potential of online, on-device robot learning, promising smaller reality gaps and better performance.