Robust Object Detection via Instance-Level Temporal Cycle Confusion
This work addresses the challenge of reliable object detection in real-world applications with domain shifts, representing an incremental advancement through a novel self-supervised method.
The paper tackles the problem of improving object detector robustness to domain shifts by introducing a self-supervised task called instance-level temporal cycle confusion (CycConf), which leads to consistent out-of-domain performance improvements and establishes a new state-of-the-art on standard unsupervised domain adaptation benchmarks.
Building reliable object detectors that are robust to domain shifts, such as various changes in context, viewpoint, and object appearances, is critical for real-world applications. In this work, we study the effectiveness of auxiliary self-supervised tasks to improve the out-of-distribution generalization of object detectors. Inspired by the principle of maximum entropy, we introduce a novel self-supervised task, instance-level temporal cycle confusion (CycConf), which operates on the region features of the object detectors. For each object, the task is to find the most different object proposals in the adjacent frame in a video and then cycle back to itself for self-supervision. CycConf encourages the object detector to explore invariant structures across instances under various motions, which leads to improved model robustness in unseen domains at test time. We observe consistent out-of-domain performance improvements when training object detectors in tandem with self-supervised tasks on large-scale video datasets (BDD100K and Waymo open data). The joint training framework also establishes a new state-of-the-art on standard unsupervised domain adaptative detection benchmarks (Cityscapes, Foggy Cityscapes, and Sim10K). The code and models are available at https://github.com/xinw1012/cycle-confusion.