CVROMay 3, 2022

3D Semantic Scene Perception using Distributed Smart Edge Sensors

arXiv:2205.01460v16 citationsh-index: 57
Originality Incremental advance
AI Analysis

This addresses privacy and bandwidth issues in surveillance or smart environments by enabling real-time 3D scene understanding, but it is incremental as it builds on existing edge computing and CNN methods.

The paper tackles the problem of 3D semantic scene perception by developing a system using distributed smart edge sensors with on-device CNN inference, which reduces bandwidth and privacy risks by processing images locally and streaming only semantic data; it achieves real-time performance in multi-person scenes, though no concrete numbers are provided.

We present a system for 3D semantic scene perception consisting of a network of distributed smart edge sensors. The sensor nodes are based on an embedded CNN inference accelerator and RGB-D and thermal cameras. Efficient vision CNN models for object detection, semantic segmentation, and human pose estimation run on-device in real time. 2D human keypoint estimations, augmented with the RGB-D depth estimate, as well as semantically annotated point clouds are streamed from the sensors to a central backend, where multiple viewpoints are fused into an allocentric 3D semantic scene model. As the image interpretation is computed locally, only semantic information is sent over the network. The raw images remain on the sensor boards, significantly reducing the required bandwidth, and mitigating privacy risks for the observed persons. We evaluate the proposed system in challenging real-world multi-person scenes in our lab. The proposed perception system provides a complete scene view containing semantically annotated 3D geometry and estimates 3D poses of multiple persons in real time.

Code Implementations2 repos
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes