CVIVNov 30, 2020

SIR: Self-supervised Image Rectification via Seeing the Same Scene from Multiple Different Lenses

arXiv:2011.14611v217 citations
AI Analysis

This work provides a self-supervised solution for improving the generalization of image rectification models, which is beneficial for applications using real-world fisheye camera data where ground-truth distortion parameters are scarce.

This paper addresses the problem of image rectification for fisheye images, where existing deep learning methods struggle with generalization from synthetic data to real-world scenarios. The authors propose a self-supervised method (SIR) that leverages the consistency of rectified images from different lenses viewing the same scene, achieving comparable or better performance than supervised baselines on both synthetic and real-world datasets.

Deep learning has demonstrated its power in image rectification by leveraging the representation capacity of deep neural networks via supervised training based on a large-scale synthetic dataset. However, the model may overfit the synthetic images and generalize not well on real-world fisheye images due to the limited universality of a specific distortion model and the lack of explicitly modeling the distortion and rectification process. In this paper, we propose a novel self-supervised image rectification (SIR) method based on an important insight that the rectified results of distorted images of a same scene from different lens should be the same. Specifically, we devise a new network architecture with a shared encoder and several prediction heads, each of which predicts the distortion parameter of a specific distortion model. We further leverage a differentiable warping module to generate the rectified images and re-distorted images from the distortion parameters and exploit the intra- and inter-model consistency between them during training, thereby leading to a self-supervised learning scheme without the need for ground-truth distortion parameters or normal images. Experiments on synthetic dataset and real-world fisheye images demonstrate that our method achieves comparable or even better performance than the supervised baseline method and representative state-of-the-art methods. Self-supervised learning also improves the universality of distortion models while keeping their self-consistency.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes