CV NEJul 7, 2012

Object Recognition with Multi-Scale Pyramidal Pooling Networks

Jonathan Masci, Ueli Meier, Gabriel Fricout, Jürgen Schmidhuber

arXiv:1207.1765v1

Originality Incremental advance

AI Analysis

This addresses a practical constraint in computer vision for tasks like industrial inspection where input sizes vary, offering an incremental improvement over existing methods.

The paper tackles the problem of object recognition with variable-sized images by introducing a Multi-Scale Pyramidal Pooling Network, which eliminates the need for equal-sized inputs and improves generalization, especially with scarce training data, achieving competitive results on benchmark datasets and applicability in industrial defect classification.

We present a Multi-Scale Pyramidal Pooling Network, featuring a novel pyramidal pooling layer at multiple scales and a novel encoding layer. Thanks to the former the network does not require all images of a given classification task to be of equal size. The encoding layer improves generalisation performance in comparison to similar neural network architectures, especially when training data is scarce. We evaluate and compare our system to convolutional neural networks and state-of-the-art computer vision methods on various benchmark datasets. We also present results on industrial steel defect classification, where existing architectures are not applicable because of the constraint on equally sized input images. The proposed architecture can be seen as a fully supervised hierarchical bag-of-features extension that is trained online and can be fine-tuned for any given task.

View on arXiv PDF

Similar