LGMLFeb 22, 2020

An Optimization and Generalization Analysis for Max-Pooling Networks

arXiv:2002.09781v49 citations
AI Analysis

This provides foundational insights for machine vision researchers, addressing a key theoretical gap in deep learning architectures.

The paper tackles the theoretical understanding of max-pooling in convolutional networks by proving they can be globally optimized and generalize well even when over-parameterized, with empirical validation showing CNNs outperform fully connected networks in pattern detection tasks.

Max-Pooling operations are a core component of deep learning architectures. In particular, they are part of most convolutional architectures used in machine vision, since pooling is a natural approach to pattern detection problems. However, these architectures are not well understood from a theoretical perspective. For example, we do not understand when they can be globally optimized, and what is the effect of over-parameterization on generalization. Here we perform a theoretical analysis of a convolutional max-pooling architecture, proving that it can be globally optimized, and can generalize well even for highly over-parameterized models. Our analysis focuses on a data generating distribution inspired by pattern detection problem, where a "discriminative" pattern needs to be detected among "spurious" patterns. We empirically validate that CNNs significantly outperform fully connected networks in our setting, as predicted by our theoretical results.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes