LGDec 16, 2025

How Does Fourier Analysis Network Work? A Mechanism Analysis and a New Dual-Activation Layer Proposal

arXiv:2512.14873v2

Originality Incremental advance

AI Analysis

This work addresses the problem of slow convergence and optimization issues in neural networks for researchers and practitioners by proposing an incremental improvement to activation functions.

The paper tackled the unclear mechanism behind Fourier Analysis Network (FAN) improvements by showing that only sine activation helps performance due to mitigating vanishing gradients near zero, not its periodic nature, and developed the Dual-Activation Layer (DAL) as a more efficient convergence accelerator. In evaluations on tasks like MNIST classification and ECG recognition, DAL models converged faster and achieved equal or higher validation accuracy compared to conventional activations.

Fourier Analysis Network (FAN) was recently proposed as a simple way to improve neural network performance by replacing part of Rectified Linear Unit (ReLU) activations with sine and cosine functions. Although several studies have reported small but consistent gains across tasks, the underlying mechanism behind these improvements has remained unclear. In this work, we show that only the sine activation contributes positively to performance, whereas the cosine activation tends to be detrimental. Our analysis reveals that the improvement is not a consequence of the sine function's periodic nature; instead, it stems from the function's local behavior near x = 0, where its non-zero derivative mitigates the vanishing-gradient problem. We further show that FAN primarily alleviates the dying-ReLU problem, in which a neuron consistently receives negative inputs, produces zero gradients, and stops learning. Although modern ReLU-like activations, such as Leaky ReLU, GELU, and Swish, reduce ReLU's zero-gradient region, they still contain input domains where gradients remain significantly diminished, contributing to slower optimization and hindering rapid convergence. FAN addresses this limitation by introducing a more stable gradient pathway. This analysis shifts the understanding of FAN's benefits from a spectral interpretation to a concrete analysis of training dynamics, leading to the development of the Dual-Activation Layer (DAL), a more efficient convergence accelerator. We evaluate DAL on three tasks: classification of noisy sinusoidal signals versus pure noise, MNIST digit classification, and Electrocardiogram (ECG)-based biometric recognition. In all cases, DAL models converge faster and achieve equal or higher validation accuracy compared to models with conventional activations.

View on arXiv PDF

Similar