LGMLJun 9, 2020

The Curious Case of Convex Neural Networks

arXiv:2006.05103v332 citations
Originality Incremental advance
AI Analysis

This work addresses generalization issues for practitioners using neural networks, though it is incremental as it builds on existing architectures with added constraints.

The paper tackles the problem of overfitting and generalization in neural networks by enforcing convexity constraints, resulting in improved performance and robustness to label noise across standard image classification datasets.

In this paper, we investigate a constrained formulation of neural networks where the output is a convex function of the input. We show that the convexity constraints can be enforced on both fully connected and convolutional layers, making them applicable to most architectures. The convexity constraints include restricting the weights (for all but the first layer) to be non-negative and using a non-decreasing convex activation function. Albeit simple, these constraints have profound implications on the generalization abilities of the network. We draw three valuable insights: (a) Input Output Convex Neural Networks (IOC-NNs) self regularize and reduce the problem of overfitting; (b) Although heavily constrained, they outperform the base multi layer perceptrons and achieve similar performance as compared to base convolutional architectures and (c) IOC-NNs show robustness to noise in train labels. We demonstrate the efficacy of the proposed idea using thorough experiments and ablation studies on standard image classification datasets with three different neural network architectures.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes