CVFeb 21

Robust Self-Supervised Cross-Modal Super-Resolution against Real-World Misaligned Observations

Xiaoyu Dong, Jiahuan Li, Ziteng Cui, Naoto Yokoya

arXiv:2602.18822v11.5h-index: 8Has Code

Originality Incremental advance

AI Analysis

This addresses the problem of enhancing low-resolution images using misaligned high-resolution guides for applications in computer vision, though it is incremental as it builds on existing cross-modal SR methods.

The paper tackles cross-modal super-resolution on real-world misaligned data by proposing RobSelf, a fully self-supervised model that achieves state-of-the-art performance and superior efficiency without requiring training data or ground-truth supervision.

We study cross-modal super-resolution (SR) on real-world misaligned data, where only a limited number of low-resolution (LR) source and high-resolution (HR) guide image pairs with complex spatial misalignments are available. To address this challenge, we propose RobSelf--a fully self-supervised model that is optimized online, requiring no training data, ground-truth supervision, or pre-alignment. RobSelf features two key techniques: a misalignment-aware feature translator and a content-aware reference filter. The translator reformulates unsupervised cross-modal and cross-resolution alignment as a weakly-supervised, misalignment-aware translation subtask, producing an aligned guide feature with inherent redundancy. Guided by this feature, the filter performs reference-based discriminative self-enhancement on the source, enabling SR predictions with high resolution and high fidelity. Across a variety of tasks, we demonstrate that RobSelf achieves state-of-the-art performance and superior efficiency. Additionally, we introduce a real-world dataset, RealMisSR, to advance research on this topic. Dataset and code: https://github.com/palmdong/RobSelf.

View on arXiv PDF Code

Similar