CVJul 24, 2023

Entropy Transformer Networks: A Learning Approach via Tangent Bundle Data Manifold

arXiv:2307.12517v11 citationsh-index: 25
Originality Incremental advance
AI Analysis

This work addresses a specific bottleneck in CNN architectures for computer vision, offering an incremental improvement over standard STNs.

The paper tackled the problem of poor performance and gradient norm preservation in Spatial Transformer Networks (STNs) under scale variations by proposing Entropy STN (ESTN), which interpolates on data manifold distributions with entropy regularization, resulting in improved predictive accuracy and reduced computational cost across image reconstruction and classification tasks.

This paper focuses on an accurate and fast interpolation approach for image transformation employed in the design of CNN architectures. Standard Spatial Transformer Networks (STNs) use bilinear or linear interpolation as their interpolation, with unrealistic assumptions about the underlying data distributions, which leads to poor performance under scale variations. Moreover, STNs do not preserve the norm of gradients in propagation due to their dependency on sparse neighboring pixels. To address this problem, a novel Entropy STN (ESTN) is proposed that interpolates on the data manifold distributions. In particular, random samples are generated for each pixel in association with the tangent space of the data manifold and construct a linear approximation of their intensity values with an entropy regularizer to compute the transformer parameters. A simple yet effective technique is also proposed to normalize the non-zero values of the convolution operation, to fine-tune the layers for gradients' norm-regularization during training. Experiments on challenging benchmarks show that the proposed ESTN can improve predictive accuracy over a range of computer vision tasks, including image reconstruction, and classification, while reducing the computational cost.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes