Rethinking Encoder-Decoder Flow Through Shared Structures
This addresses a bottleneck in dense prediction for computer vision researchers, offering an incremental improvement over existing methods.
The paper tackled the problem of limited decoder innovation in dense prediction tasks by introducing shared structures called banks, which improved depth estimation performance on state-of-the-art transformer-based architectures for natural and synthetic images.
Dense prediction tasks have enjoyed a growing complexity of encoder architectures, decoders, however, have remained largely the same. They rely on individual blocks decoding intermediate feature maps sequentially. We introduce banks, shared structures that are used by each decoding block to provide additional context in the decoding process. These structures, through applying them via resampling and feature fusion, improve performance on depth estimation for state-of-the-art transformer-based architectures on natural and synthetic images whilst training on large-scale datasets.