On the failure of ReLU activation for physics-informed machine learning
This addresses a specific issue in physics-informed machine learning for researchers, but it is incremental as it builds on known limitations of ReLU.
The paper investigates why the ReLU activation function underperforms in physics-informed machine learning, even on first-order variational problems, and identifies that automatic differentiation in PyTorch mis-specifies gradients due to second derivatives of discontinuous fields.
Physics-informed machine learning uses governing ordinary and/or partial differential equations to train neural networks to represent the solution field. Like any machine learning problem, the choice of activation function influences the characteristics and performance of the solution obtained from physics-informed training. Several studies have compared common activation functions on benchmark differential equations, and have unanimously found that the rectified linear unit (ReLU) is outperformed by competitors such as the sigmoid, hyperbolic tangent, and swish activation functions. In this work, we diagnose the poor performance of ReLU on physics-informed machine learning problems. While it is well-known that the piecewise linear form of ReLU prevents it from being used on second-order differential equations, we show that ReLU fails even on variational problems involving only first derivatives. We identify the cause of this failure as second derivatives of the activation, which are taken not in the formulation of the loss, but in the process of training. Namely, we show that automatic differentiation in PyTorch fails to characterize derivatives of discontinuous fields, which causes the gradient of the physics-informed loss to be mis-specified, thus explaining the poor performance of ReLU.