SimFIR: A Simple Framework for Fisheye Image Rectification with Self-supervised Representation Learning
This addresses distortion correction in fisheye images for applications like computer vision, though it appears incremental as it builds on existing self-supervised and ViT techniques.
The paper tackled fisheye image rectification by introducing SimFIR, a framework using self-supervised representation learning to leverage distortion patterns, resulting in superior performance over state-of-the-art methods with strong generalization on real-world images.
In fisheye images, rich distinct distortion patterns are regularly distributed in the image plane. These distortion patterns are independent of the visual content and provide informative cues for rectification. To make the best of such rectification cues, we introduce SimFIR, a simple framework for fisheye image rectification based on self-supervised representation learning. Technically, we first split a fisheye image into multiple patches and extract their representations with a Vision Transformer (ViT). To learn fine-grained distortion representations, we then associate different image patches with their specific distortion patterns based on the fisheye model, and further subtly design an innovative unified distortion-aware pretext task for their learning. The transfer performance on the downstream rectification task is remarkably boosted, which verifies the effectiveness of the learned representations. Extensive experiments are conducted, and the quantitative and qualitative results demonstrate the superiority of our method over the state-of-the-art algorithms as well as its strong generalization ability on real-world fisheye images.