LG AIFeb 12, 2021

Exploiting Spline Models for the Training of Fully Connected Layers in Neural Network

Kanya Mo, Shen Zheng, Xiwei Wang, Jinghua Wang, Klaus-Dieter Schewe

arXiv:2102.06554v11.6

Originality Incremental advance

AI Analysis

This work addresses training challenges for fully connected layers in neural networks, offering improvements in efficiency and interpretability, but it is incremental as it builds on existing spline methods.

The paper tackles the difficulty and inefficiency of training fully connected layers in neural networks by proposing a spline-based approach that first fits a continuous piece-wise linear model to the data and then constructs and trains an ANN from it. The method reduces computational cost, accelerates convergence, and increases interpretability compared to standard random initialization and gradient descent.

The fully connected (FC) layer, one of the most fundamental modules in artificial neural networks (ANN), is often considered difficult and inefficient to train due to issues including the risk of overfitting caused by its large amount of parameters. Based on previous work studying ANN from linear spline perspectives, we propose a spline-based approach that eases the difficulty of training FC layers. Given some dataset, we first obtain a continuous piece-wise linear (CPWL) fit through spline methods such as multivariate adaptive regression spline (MARS). Next, we construct an ANN model from the linear spline model and continue to train the ANN model on the dataset using gradient descent optimization algorithms. Our experimental results and theoretical analysis show that our approach reduces the computational cost, accelerates the convergence of FC layers, and significantly increases the interpretability of the resulting model (FC layers) compared with standard ANN training with random parameter initialization followed by gradient descent optimizations.

View on arXiv PDF

Similar