CVDec 3, 2024

Redundant Queries in DETR-Based 3D Detection Methods: Unnecessary and Prunable

Lizhen Xu, Zehao Wu, Wenzhao Qiu, Shanmin Pang, Xiuxiu Bai, Kuizhi Mei, Jianru Xue

arXiv:2412.02054v37.66 citationsh-index: 3Has Code

Originality Incremental advance

AI Analysis

This addresses efficiency issues for users of query-based 3D detectors, particularly in resource-constrained environments, but is incremental as it builds on existing methods.

The paper tackles the problem of redundant object queries in DETR-based 3D detection methods, which cause unnecessary computational and memory costs, and proposes GPQ to prune queries incrementally, achieving up to 1.35x inference speedup on desktop GPUs and reducing FLOPs by 67.86% on edge devices while maintaining performance.

Query-based models are extensively used in 3D object detection tasks, with a wide range of pre-trained checkpoints readily available online. However, despite their popularity, these models often require an excessive number of object queries, far surpassing the actual number of objects to detect. The redundant queries result in unnecessary computational and memory costs. In this paper, we find that not all queries contribute equally -- a significant portion of queries have a much smaller impact compared to others. Based on this observation, we propose an embarrassingly simple approach called Gradually Pruning Queries (GPQ), which prunes queries incrementally based on their classification scores. A key advantage of GPQ is that it requires no additional learnable parameters. It is straightforward to implement in any query-based method, as it can be seamlessly integrated as a fine-tuning step using an existing checkpoint after training. With GPQ, users can easily generate multiple models with fewer queries, starting from a checkpoint with an excessive number of queries. Experiments on various advanced 3D detectors show that GPQ effectively reduces redundant queries while maintaining performance. Using our method, model inference on desktop GPUs can be accelerated by up to 1.35x. Moreover, after deployment on edge devices, it achieves up to a 67.86% reduction in FLOPs and a 65.16% decrease in inference time. The code will be available at https://github.com/iseri27/Gpq.

View on arXiv PDF Code

Similar