Decomposer: Semi-supervised Learning of Image Restoration and Image Decomposition
This addresses image restoration and decomposition for computer vision applications, but appears incremental as it builds on existing transformer and U-Net architectures for a specific dataset.
The paper tackled the problem of decomposing distorted image sequences into original images and applied augmentations like shadows, light, and occlusions, using a transformer-based model that achieved learning to differentiate between distortions through semi-supervised pre-training.
We present Decomposer, a semi-supervised reconstruction model that decomposes distorted image sequences into their fundamental building blocks - the original image and the applied augmentations, i.e., shadow, light, and occlusions. To solve this problem, we use the SIDAR dataset that provides a large number of distorted image sequences: each sequence contains images with shadows, lighting, and occlusions applied to an undistorted version. Each distortion changes the original signal in different ways, e.g., additive or multiplicative noise. We propose a transformer-based model to explicitly learn this decomposition. The sequential model uses 3D Swin-Transformers for spatio-temporal encoding and 3D U-Nets as prediction heads for individual parts of the decomposition. We demonstrate that by separately pre-training our model on weakly supervised pseudo labels, we can steer our model to optimize for our ambiguous problem definition and learn to differentiate between the different image distortions.