SBNet: Segmentation-based Network for Natural Language-based Vehicle Search
This addresses the problem of retrieving vehicles from images using natural language descriptions, which is useful for applications like police searches, but it appears incremental as it builds on existing methods with specific enhancements.
The paper tackles natural language-based vehicle retrieval by proposing SBNet, a deep neural network that uses segmentation and task-specific modules, achieving a significant improvement over the baseline in the AI City Challenge 2021.
Natural language-based vehicle retrieval is a task to find a target vehicle within a given image based on a natural language description as a query. This technology can be applied to various areas including police searching for a suspect vehicle. However, it is challenging due to the ambiguity of language descriptions and the difficulty of processing multi-modal data. To tackle this problem, we propose a deep neural network called SBNet that performs natural language-based segmentation for vehicle retrieval. We also propose two task-specific modules to improve performance: a substitution module that helps features from different domains to be embedded in the same space and a future prediction module that learns temporal information. SBnet has been trained using the CityFlow-NL dataset that contains 2,498 tracks of vehicles with three unique natural language descriptions each and tested 530 unique vehicle tracks and their corresponding query sets. SBNet achieved a significant improvement over the baseline in the natural language-based vehicle tracking track in the AI City Challenge 2021.