Deep Generalized Max Pooling
This work addresses a specific bottleneck in CNN architectures for domain-specific tasks like historical document analysis, offering an incremental improvement over existing pooling methods.
The paper tackles the problem of global pooling layers in CNNs being spatially independent, which can imbalance contributions from frequent and rare activations, by proposing Deep Generalized Max Pooling that re-weights descriptors to equalize their impact. The result shows superiority over average and max pooling, achieving improved performance on classification tasks for Latin medieval manuscripts (CLAMM'16, CLAMM'17) and writer identification (Historical-WI'17), though specific numerical gains are not provided in the abstract.
Global pooling layers are an essential part of Convolutional Neural Networks (CNN). They are used to aggregate activations of spatial locations to produce a fixed-size vector in several state-of-the-art CNNs. Global average pooling or global max pooling are commonly used for converting convolutional features of variable size images to a fix-sized embedding. However, both pooling layer types are computed spatially independent: each individual activation map is pooled and thus activations of different locations are pooled together. In contrast, we propose Deep Generalized Max Pooling that balances the contribution of all activations of a spatially coherent region by re-weighting all descriptors so that the impact of frequent and rare ones is equalized. We show that this layer is superior to both average and max pooling on the classification of Latin medieval manuscripts (CLAMM'16, CLAMM'17), as well as writer identification (Historical-WI'17).