LGAICVFeb 5, 2025

Gompertz Linear Units: Leveraging Asymmetry for Enhanced Learning Dynamics

arXiv:2502.03654v21 citationsh-index: 11
Originality Incremental advance
AI Analysis

This work addresses the need for more effective activation functions to enhance training dynamics in deep learning, though it appears incremental as it builds on existing self-gated activation paradigms.

The paper tackled the problem of activation functions in deep learning by introducing the Gompertz Linear Unit (GoLU), which achieved superior performance across diverse tasks compared to state-of-the-art alternatives like GELU and Swish.

Activation functions are fundamental elements of deep learning architectures as they significantly influence training dynamics. ReLU, while widely used, is prone to the dying neuron problem, which has been mitigated by variants such as LeakyReLU, PReLU, and ELU that better handle negative neuron outputs. Recently, self-gated activations like GELU and Swish have emerged as state-of-the-art alternatives, leveraging their smoothness to ensure stable gradient flow and prevent neuron inactivity. In this work, we introduce the Gompertz Linear Unit (GoLU), a novel self-gated activation function defined as $\mathrm{GoLU}(x) = x \, \mathrm{Gompertz}(x)$, where $\mathrm{Gompertz}(x) = e^{-e^{-x}}$. The GoLU activation leverages the right-skewed asymmetry in the Gompertz function to reduce variance in the latent space more effectively compared to GELU and Swish, while preserving robust gradient flow. Extensive experiments across diverse tasks, including Image Classification, Language Modeling, Semantic Segmentation, Object Detection, Instance Segmentation, and Diffusion, highlight GoLU's superior performance relative to state-of-the-art activation functions, establishing GoLU as a robust alternative to existing activation functions.

Code Implementations1 repo
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes