An in-depth look at approximation via deep and narrow neural networks
This is an incremental theoretical analysis for neural network researchers, focusing on edge cases in approximation theory.
The paper tackles the problem of approximating a specific counterexample function using deep neural networks with widths at or just above the theoretical threshold (w=n and w=n+1), finding that approximation quality varies with depth and is affected by dying neurons.
In 2017, Hanin and Sellke showed that the class of arbitrarily deep, real-valued, feed-forward and ReLU-activated networks of width w forms a dense subset of the space of continuous functions on R^n, with respect to the topology of uniform convergence on compact sets, if and only if w>n holds. To show the necessity, a concrete counterexample function f:R^n->R was used. In this note we actually approximate this very f by neural networks in the two cases w=n and w=n+1 around the aforementioned threshold. We study how the approximation quality behaves if we vary the depth and what effect (spoiler alert: dying neurons) cause that behavior.