SeDAR - Semantic Detection and Ranging: Humans can localise without LiDAR, can robots?
This work addresses the challenge of enabling robots to localize more like humans, potentially reducing reliance on expensive sensors like LiDAR, though it appears incremental as it builds on existing semantic methods.
The paper tackles the problem of robot global localization by proposing a method that uses semantic cues from floorplans and RGB images instead of traditional depth measurements, achieving results comparable to state-of-the-art approaches without relying on LiDAR.
How does a person work out their location using a floorplan? It is probably safe to say that we do not explicitly measure depths to every visible surface and try to match them against different pose estimates in the floorplan. And yet, this is exactly how most robotic scan-matching algorithms operate. Similarly, we do not extrude the 2D geometry present in the floorplan into 3D and try to align it to the real-world. And yet, this is how most vision-based approaches localise. Humans do the exact opposite. Instead of depth, we use high level semantic cues. Instead of extruding the floorplan up into the third dimension, we collapse the 3D world into a 2D representation. Evidence of this is that many of the floorplans we use in everyday life are not accurate, opting instead for high levels of discriminative landmarks. In this work, we use this insight to present a global localisation approach that relies solely on the semantic labels present in the floorplan and extracted from RGB images. While our approach is able to use range measurements if available, we demonstrate that they are unnecessary as we can achieve results comparable to state-of-the-art without them.