A Dynamic Transformer Network for Vehicle Detection
This work addresses vehicle detection for traffic systems, but it appears incremental as it builds on existing deep learning and Transformer methods without claiming major breakthroughs.
The paper tackles vehicle detection by proposing DTNet, a dynamic Transformer network that uses dynamic convolution, mixed attention, and translation-variant convolution to enhance adaptability and extract salient information, achieving competitive results as demonstrated in experiments.
Stable consumer electronic systems can assist traffic better. Good traffic consumer electronic systems require collaborative work between traffic algorithms and hardware. However, performance of popular traffic algorithms containing vehicle detection methods based on deep networks via learning data relation rather than learning differences in different lighting and occlusions is limited. In this paper, we present a dynamic Transformer network for vehicle detection (DTNet). DTNet utilizes a dynamic convolution to guide a deep network to dynamically generate weights to enhance adaptability of an obtained detector. Taking into relations of different information account, a mixed attention mechanism based channel attention and Transformer is exploited to strengthen relations of channels and pixels to extract more salient information for vehicle detection. To overcome the drawback of difference in an image account, a translation-variant convolution relies on spatial location information to refine obtained structural information for vehicle detection. Experimental results illustrate that our DTNet is competitive for vehicle detection. Code of the proposed DTNet can be obtained at https://github.com/hellloxiaotian/DTNet.