Enhanced Object Detection: A Study on Vast Vocabulary Object Detection Track for V3Det Challenge 2024
This work addresses the difficulty of vast vocabulary object detection for computer vision applications, but it appears incremental as it builds on existing supervised detectors with specific adjustments.
The researchers tackled the problem of detecting objects with complex categories and boxes in the V3Det dataset by designing improvements to network structure, loss function, and training strategies, resulting in enhanced performance over the baseline and excellent rankings on the V3Det Challenge 2024 leaderboards.
In this technical report, we present our findings from the research conducted on the Vast Vocabulary Visual Detection (V3Det) dataset for Supervised Vast Vocabulary Visual Detection task. How to deal with complex categories and detection boxes has become a difficulty in this track. The original supervised detector is not suitable for this task. We have designed a series of improvements, including adjustments to the network structure, changes to the loss function, and design of training strategies. Our model has shown improvement over the baseline and achieved excellent rankings on the Leaderboard for both the Vast Vocabulary Object Detection (Supervised) track and the Open Vocabulary Object Detection (OVD) track of the V3Det Challenge 2024.