LGSep 29, 2023

Generalized Activation via Multivariate Projection

Jiayun Li, Yuxiao Cheng, Yiwen Lu, Zhuofan Xia, Yilin Mo, Gao Huang

arXiv:2309.17194v22.0h-index: 12Has Code

Originality Incremental advance

AI Analysis

This work addresses the problem of improving neural network performance for researchers and practitioners by proposing a novel activation function, though it appears incremental as it builds on existing projection concepts.

The paper tackles the limitation of ReLU activation functions by introducing a Multivariate Projection Unit (MPU) based on generalized projections onto convex cones, such as the Second-Order Cone, and shows that it outperforms ReLU in expressive power and effectiveness in experiments.

Activation functions are essential to introduce nonlinearity into neural networks, with the Rectified Linear Unit (ReLU) often favored for its simplicity and effectiveness. Motivated by the structural similarity between a shallow Feedforward Neural Network (FNN) and a single iteration of the Projected Gradient Descent (PGD) algorithm, a standard approach for solving constrained optimization problems, we consider ReLU as a projection from R onto the nonnegative half-line R+. Building on this interpretation, we extend ReLU by substituting it with a generalized projection operator onto a convex cone, such as the Second-Order Cone (SOC) projection, thereby naturally extending it to a Multivariate Projection Unit (MPU), an activation function with multiple inputs and multiple outputs. We further provide mathematical proof establishing that FNNs activated by SOC projections outperform those utilizing ReLU in terms of expressive power. Experimental evaluations on widely-adopted architectures further corroborate MPU's effectiveness against a broader range of existing activation functions.

View on arXiv PDF Code

Similar