DeSTNet: Densely Fused Spatial Transformer Networks
This work addresses spatial transformation issues in computer vision for improved model performance, but it is incremental as it builds on existing STN methods.
The paper tackled the problem of CNNs degrading under large intra-class spatial variability by proposing DeSTNet, a dense fusion pattern for multiple Spatial Transformer Networks, which outperformed STNs and CSTNs in accuracy and robustness on MNIST, GTSRB, and IDocDB benchmarks.
Modern Convolutional Neural Networks (CNN) are extremely powerful on a range of computer vision tasks. However, their performance may degrade when the data is characterised by large intra-class variability caused by spatial transformations. The Spatial Transformer Network (STN) is currently the method of choice for providing CNNs the ability to remove those transformations and improve performance in an end-to-end learning framework. In this paper, we propose Densely Fused Spatial Transformer Network (DeSTNet), which, to our best knowledge, is the first dense fusion pattern for combining multiple STNs. Specifically, we show how changing the connectivity pattern of multiple STNs from sequential to dense leads to more powerful alignment modules. Extensive experiments on three benchmarks namely, MNIST, GTSRB, and IDocDB show that the proposed technique outperforms related state-of-the-art methods (i.e., STNs and CSTNs) both in terms of accuracy and robustness.