UTA-Sign: Unsupervised Thermal Video Augmentation via Event-Assisted Traffic Signage Sketching
This work addresses safety risks in autonomous driving systems by enhancing signage perception in low-illumination environments, though it is incremental as it builds on existing multimodal fusion techniques.
The paper tackled the problem of accurately perceiving traffic signage in low-light conditions by fusing thermal and event camera data, resulting in improved detection accuracy and superior quality in traffic signage sketching as validated on real-world datasets.
The thermal camera excels at perceiving outdoor environments under low-light conditions, making it ideal for applications such as nighttime autonomous driving and unmanned navigation. However, thermal cameras encounter challenges when capturing signage from objects made of similar materials, which can pose safety risks for accurately understanding semantics in autonomous driving systems. In contrast, the neuromorphic vision camera, also known as an event camera, detects changes in light intensity asynchronously and has proven effective in high-speed, low-light traffic environments. Recognizing the complementary characteristics of these two modalities, this paper proposes UTA-Sign, an unsupervised thermal-event video augmentation for traffic signage in low-illumination environments, targeting elements such as license plates and roadblock indicators. To address the signage blind spots of thermal imaging and the non-uniform sampling of event cameras, we developed a dual-boosting mechanism that fuses thermal frames and event signals for consistent signage representation over time. The proposed method utilizes thermal frames to provide accurate motion cues as temporal references for aligning the uneven event signals. At the same time, event signals contribute subtle signage content to the raw thermal frames, enhancing the overall understanding of the environment. The proposed method is validated on datasets collected from real-world scenarios, demonstrating superior quality in traffic signage sketching and improved detection accuracy at the perceptual level.