LGCCMLJun 26, 2020

Implicit Convex Regularizers of CNN Architectures: Convex Optimization of Two- and Three-Layer Networks in Polynomial Time

arXiv:2006.14798v328 citations
Originality Highly original
AI Analysis

This provides a theoretical foundation for globally optimizing CNN training, which could benefit researchers and practitioners in machine learning by offering efficient and interpretable solutions.

The authors tackled the non-convex optimization problem in training Convolutional Neural Networks (CNNs) by introducing exact convex optimization formulations for two- and three-layer CNNs with ReLU activations, achieving polynomial-time complexity with respect to data samples, neurons, and dimension.

We study training of Convolutional Neural Networks (CNNs) with ReLU activations and introduce exact convex optimization formulations with a polynomial complexity with respect to the number of data samples, the number of neurons, and data dimension. More specifically, we develop a convex analytic framework utilizing semi-infinite duality to obtain equivalent convex optimization problems for several two- and three-layer CNN architectures. We first prove that two-layer CNNs can be globally optimized via an $\ell_2$ norm regularized convex program. We then show that multi-layer circular CNN training problems with a single ReLU layer are equivalent to an $\ell_1$ regularized convex program that encourages sparsity in the spectral domain. We also extend these results to three-layer CNNs with two ReLU layers. Furthermore, we present extensions of our approach to different pooling methods, which elucidates the implicit architectural bias as convex regularizers.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes