CVMar 18, 2024

EffiPerception: an Efficient Framework for Various Perception Tasks

Xinhao Xiang, Simon Dräger, Jiawei Zhang

arXiv:2403.12317v13.71 citationsh-index: 3

Originality Synthesis-oriented

AI Analysis

This work addresses efficiency challenges for computer vision practitioners, but it appears incremental as it builds on existing methods with optimizations.

The authors tackled the accuracy-speed-memory trade-off in computer vision perception tasks by proposing the EffiPerception framework, which achieved improved overall performance across 2D/3D object detection and segmentation tasks on datasets like KITTI and COCO compared to earlier methods.

The accuracy-speed-memory trade-off is always the priority to consider for several computer vision perception tasks. Previous methods mainly focus on a single or small couple of these tasks, such as creating effective data augmentation, feature extractor, learning strategies, etc. These approaches, however, could be inherently task-specific: their proposed model's performance may depend on a specific perception task or a dataset. Targeting to explore common learning patterns and increasing the module robustness, we propose the EffiPerception framework. It could achieve great accuracy-speed performance with relatively low memory cost under several perception tasks: 2D Object Detection, 3D Object Detection, 2D Instance Segmentation, and 3D Point Cloud Segmentation. Overall, the framework consists of three parts: (1) Efficient Feature Extractors, which extract the input features for each modality. (2) Efficient Layers, plug-in plug-out layers that further process the feature representation, aggregating core learned information while pruning noisy proposals. (3) The EffiOptim, an 8-bit optimizer to further cut down the computational cost and facilitate performance stability. Extensive experiments on the KITTI, semantic-KITTI, and COCO datasets revealed that EffiPerception could show great accuracy-speed-memory overall performance increase within the four detection and segmentation tasks, in comparison to earlier, well-respected methods.

View on arXiv PDF

Similar