SYLGNov 26, 2022

A Tutorial on Neural Networks and Gradient-free Training

arXiv:2211.17217v11 citationsh-index: 26
Originality Synthesis-oriented
AI Analysis

This is an incremental tutorial for learners in machine education, offering no new research findings.

The paper provides a tutorial on representing neural networks mathematically and compares gradient-based training with two gradient-free methods in terms of convergence rate and prediction accuracy.

This paper presents a compact, matrix-based representation of neural networks in a self-contained tutorial fashion. Specifically, we develop neural networks as a composition of several vector-valued functions. Although neural networks are well-understood pictorially in terms of interconnected neurons, neural networks are mathematical nonlinear functions constructed by composing several vector-valued functions. Using basic results from linear algebra, we represent a neural network as an alternating sequence of linear maps and scalar nonlinear functions, also known as activation functions. The training of neural networks requires the minimization of a cost function, which in turn requires the computation of a gradient. Using basic multivariable calculus results, the cost gradient is also shown to be a function composed of a sequence of linear maps and nonlinear functions. In addition to the analytical gradient computation, we consider two gradient-free training methods and compare the three training methods in terms of convergence rate and prediction accuracy.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes