CVAINov 25, 2022

PIP: Positional-encoding Image Prior

arXiv:2211.14298v39 citationsh-index: 56
Originality Incremental advance
AI Analysis

This work addresses image and video reconstruction for computer vision applications, offering a more efficient and stable method, though it is incremental as it builds on existing DIP concepts.

The authors tackled the problem of image reconstruction by revisiting the Deep Image Prior framework, replacing convolutions with pixel-level MLPs using Fourier features, which achieved similar performance to DIP with fewer parameters and extended effectively to video tasks.

In Deep Image Prior (DIP), a Convolutional Neural Network (CNN) is fitted to map a latent space to a degraded (e.g. noisy) image but in the process learns to reconstruct the clean image. This phenomenon is attributed to CNN's internal image-prior. We revisit the DIP framework, examining it from the perspective of a neural implicit representation. Motivated by this perspective, we replace the random or learned latent with Fourier-Features (Positional Encoding). We show that thanks to the Fourier features properties, we can replace the convolution layers with simple pixel-level MLPs. We name this scheme ``Positional Encoding Image Prior" (PIP) and exhibit that it performs very similarly to DIP on various image-reconstruction tasks with much less parameters required. Additionally, we demonstrate that PIP can be easily extended to videos, where 3D-DIP struggles and suffers from instability. Code and additional examples for all tasks, including videos, are available on the project page https://nimrodshabtay.github.io/PIP/

Code Implementations1 repo
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes