Dynamic Steerable Blocks in Deep Residual Networks
This work addresses the need for more efficient and adaptable filters in deep learning for computer vision, though it is incremental as it builds on existing residual and dense network architectures.
The paper tackled the problem of convolutional filters not incorporating prior visual knowledge by introducing dynamic steerable blocks that adapt filters based on input transformations, resulting in improved performance on Cifar-10+ and outperforming non-pretrained methods on the Berkeley Segmentation contour detection dataset.
Filters in convolutional networks are typically parameterized in a pixel basis, that does not take prior knowledge about the visual world into account. We investigate the generalized notion of frames designed with image properties in mind, as alternatives to this parametrization. We show that frame-based ResNets and Densenets can improve performance on Cifar-10+ consistently, while having additional pleasant properties like steerability. By exploiting these transformation properties explicitly, we arrive at dynamic steerable blocks. They are an extension of residual blocks, that are able to seamlessly transform filters under pre-defined transformations, conditioned on the input at training and inference time. Dynamic steerable blocks learn the degree of invariance from data and locally adapt filters, allowing them to apply a different geometrical variant of the same filter to each location of the feature map. When evaluated on the Berkeley Segmentation contour detection dataset, our approach outperforms all competing approaches that do not utilize pre-training. Our results highlight the benefits of image-based regularization to deep networks.