MLLGFeb 11, 2023

Global Convergence Rate of Deep Equilibrium Models with General Activations

arXiv:2302.05797v44 citationsh-index: 28
Originality Incremental advance
AI Analysis

This work provides a theoretical guarantee for DEQs with diverse activations, which is incremental as it builds on existing convergence proofs but broadens applicability.

The paper tackles the problem of proving global convergence for Deep Equilibrium Models (DEQs) with general activation functions, showing that gradient descent converges linearly to a globally optimal solution for quadratic loss, extending prior results from ReLU to any activation with bounded derivatives.

In a recent paper, Ling et al. investigated the over-parametrized Deep Equilibrium Model (DEQ) with ReLU activation. They proved that the gradient descent converges to a globally optimal solution at a linear convergence rate for the quadratic loss function. This paper shows that this fact still holds for DEQs with any general activation that has bounded first and second derivatives. Since the new activation function is generally non-homogeneous, bounding the least eigenvalue of the Gram matrix of the equilibrium point is particularly challenging. To accomplish this task, we need to create a novel population Gram matrix and develop a new form of dual activation with Hermite polynomial expansion.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes