CVFeb 4

PEPR: Privileged Event-based Predictive Regularization for Domain Generalization

arXiv:2602.04583v1h-index: 17
AI Analysis

This work addresses the critical challenge of domain shift for real-world deployment of deep neural networks in visual tasks, offering a novel approach to enhance model robustness without sacrificing semantic detail.

The paper tackles domain generalization in visual perception by proposing a cross-modal framework that uses event cameras as privileged information during training to improve the robustness of RGB models to domain shifts like day-to-night changes, achieving consistent performance gains over alignment-based baselines in object detection and semantic segmentation.

Deep neural networks for visual perception are highly susceptible to domain shift, which poses a critical challenge for real-world deployment under conditions that differ from the training data. To address this domain generalization challenge, we propose a cross-modal framework under the learning using privileged information (LUPI) paradigm for training a robust, single-modality RGB model. We leverage event cameras as a source of privileged information, available only during training. The two modalities exhibit complementary characteristics: the RGB stream is semantically dense but domain-dependent, whereas the event stream is sparse yet more domain-invariant. Direct feature alignment between them is therefore suboptimal, as it forces the RGB encoder to mimic the sparse event representation, thereby losing semantic detail. To overcome this, we introduce Privileged Event-based Predictive Regularization (PEPR), which reframes LUPI as a predictive problem in a shared latent space. Instead of enforcing direct cross-modal alignment, we train the RGB encoder with PEPR to predict event-based latent features, distilling robustness without sacrificing semantic richness. The resulting standalone RGB model consistently improves robustness to day-to-night and other domain shifts, outperforming alignment-based baselines across object detection and semantic segmentation.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes