LGAIMay 29, 2025

SG-Blend: Learning an Interpolation Between Improved Swish and GELU for Robust Neural Representations

arXiv:2505.23942v1h-index: 1Has Code
Originality Incremental advance
AI Analysis

This work addresses the need for robust activation functions in deep learning, but it is incremental as it builds on existing functions like Swish and GELU.

The paper tackles the problem of designing activation functions for deep neural networks by introducing SG-Blend, which dynamically blends improved Swish and GELU variants, resulting in performance improvements across natural language and computer vision tasks with negligible computational overhead.

The design of activation functions remains a pivotal component in optimizing deep neural networks. While prevailing choices like Swish and GELU demonstrate considerable efficacy, they often exhibit domain-specific optima. This work introduces SG-Blend, a novel activation function that blends our proposed SSwish, a first-order symmetric variant of Swish and the established GELU through dynamic interpolation. By adaptively blending these constituent functions via learnable parameters, SG-Blend aims to harness their complementary strengths: SSwish's controlled non-monotonicity and symmetry, and GELU's smooth, probabilistic profile, to achieve a more universally robust balance between model expressivity and gradient stability. We conduct comprehensive empirical evaluations across diverse modalities and architectures, showing performance improvements across all considered natural language and computer vision tasks and models. These results, achieved with negligible computational overhead, underscore SG-Blend's potential as a versatile, drop-in replacement that consistently outperforms strong contemporary baselines. The code is available at https://anonymous.4open.science/r/SGBlend-6CBC.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes