NEFeb 28, 2018
Avoiding overfitting of multilayer perceptrons by training derivativesV. I. Avrutskiy
Resistance to overfitting is observed for neural networks trained with extended backpropagation algorithm. In addition to target values, its cost function uses derivatives of those up to the $4^{\mathrm{th}}$ order. For common applications of neural networks, high order derivatives are not readily available, so simpler cases are considered: training network to approximate analytical function inside 2D and 5D domains and solving Poisson equation inside a 2D circle. For function approximation, the cost is a sum of squared differences between output and target as well as their derivatives with respect to the input. Differential equations are usually solved by putting a multilayer perceptron in place of unknown function and training its weights, so that equation holds within some margin of error. Commonly used cost is the equation's residual squared. Added terms are squared derivatives of said residual with respect to the independent variables. To investigate overfitting, the cost is minimized for points of regular grids with various spacing, and its root mean is compared with its value on much denser test set. Fully connected perceptrons with six hidden layers and $2\cdot10^{4}$, $1\cdot10^{6}$ and $5\cdot10^{6}$ weights in total are trained with Rprop until cost changes by less than 10% for last 1000 epochs, or when the $10000^{\mathrm{th}}$ epoch is reached. Training the network with $5\cdot10^{6}$ weights to represent simple 2D function using 10 points with 8 extra derivatives in each produces cost test to train ratio of $1.5$, whereas for classical backpropagation in comparable conditions this ratio is $2\cdot10^{4}$.
NEDec 14, 2017
Neural networks catching up with finite differences in solving partial differential equations in higher dimensionsV. I. Avrutskiy
Fully connected multilayer perceptrons are used for obtaining numerical solutions of partial differential equations in various dimensions. Independent variables are fed into the input layer, and the output is considered as solution's value. To train such a network one can use square of equation's residual as a cost function and minimize it with respect to weights by gradient descent. Following previously developed method, derivatives of the equation's residual along random directions in space of independent variables are also added to cost function. Similar procedure is known to produce nearly machine precision results using less than 8 grid points per dimension for 2D case. The same effect is observed here for higher dimensions: solutions are obtained on low density grids, but maintain their precision in the entire region. Boundary value problems for linear and nonlinear Poisson equations are solved inside 2, 3, 4, and 5 dimensional balls. Grids for linear cases have 40, 159, 512 and 1536 points and for nonlinear 64, 350, 1536 and 6528 points respectively. In all cases maximum error is less than $8.8\cdot10^{-6}$, and median error is less than $2.4\cdot10^{-6}$. Very weak grid requirements enable neural networks to obtain solution of 5D linear problem within 22 minutes, whereas projected solving time for finite differences on the same hardware is 50 minutes. Method is applied to second order equation, but requires little to none modifications to solve systems or higher order PDEs.
NEDec 12, 2017
Enhancing approximation abilities of neural networks by training derivativesV. I. Avrutskiy
A method to increase the precision of feedforward networks is proposed. It requires a prior knowledge of a target function derivatives of several orders and uses this information in gradient based training. Forward pass calculates not only the values of the output layer of a network but also their derivatives. The deviations of those derivatives from the target ones are used in an extended cost function and then backward pass calculates the gradient of the extended cost with respect to weights, which can then be used by any weights update algorithm. Despite a substantial increase in arithmetic operations per pattern (if compared to the conventional training), the extended cost allows to obtain 140--1000 times more accurate approximation for simple cases if the total number of operations is equal. This precision also happens to be out of reach for the regular cost function. The method fits well into the procedure of solving differential equations with neural networks. Unlike training a network to match some target mapping, which requires an explicit use of the target derivatives in the extended cost function, the cost function for solving a differential equation is based on the deviation of the equation's residual from zero and thus can be extended by differentiating the equation itself, which does not require any prior knowledge. Solving an equation with such a cost resulted in 13 times more accurate result and could be done with 3 times larger grid step. GPU-efficient algorithm for calculating the gradient of the extended cost function is proposed.
NEDec 12, 2017
Backpropagation generalized for output derivativesV. I. Avrutskiy
Backpropagation algorithm is the cornerstone for neural network analysis. Paper extends it for training any derivatives of neural network's output with respect to its input. By the dint of it feedforward networks can be used to solve or verify solutions of partial or simple, linear or nonlinear differential equations. This method vastly differs from traditional ones like finite differences on a mesh. It contains no approximations, but rather an exact form of differential operators. Algorithm is built to train a feed forward network with any number of hidden layers and any kind of sufficiently smooth activation functions. It's presented in a form of matrix-vector products so highly parallel implementation is readily possible. First part derives the method for 2D case with first and second order derivatives, second part extends it to N-dimensional case with any derivatives. All necessary expressions for using this method to solve most applied PDE can be found in Appendix D.