ML LGMay 1, 2023

Differentiable Neural Networks with RePU Activation: with Applications to Score Estimation and Isotonic Regression

Guohao Shen, Yuling Jiao, Yuanyuan Lin, Jian Huang

arXiv:2305.00608v311.87 citations

Originality Incremental advance

AI Analysis

This work provides theoretical foundations for using RePU networks in statistical estimation, benefiting researchers in machine learning and statistics, though it is incremental as it builds on existing neural network approximation theory.

The paper tackles the problem of approximating smooth functions and their derivatives using neural networks with rectified power unit (RePU) activations, establishing error bounds and demonstrating applications in score estimation and isotonic regression with non-asymptotic risk bounds and mitigation of the curse of dimensionality for low-dimensional data.

We study the properties of differentiable neural networks activated by rectified power unit (RePU) functions. We show that the partial derivatives of RePU neural networks can be represented by RePUs mixed-activated networks and derive upper bounds for the complexity of the function class of derivatives of RePUs networks. We establish error bounds for simultaneously approximating $C^s$ smooth functions and their derivatives using RePU-activated deep neural networks. Furthermore, we derive improved approximation error bounds when data has an approximate low-dimensional support, demonstrating the ability of RePU networks to mitigate the curse of dimensionality. To illustrate the usefulness of our results, we consider a deep score matching estimator (DSME) and propose a penalized deep isotonic regression (PDIR) using RePU networks. We establish non-asymptotic excess risk bounds for DSME and PDIR under the assumption that the target functions belong to a class of $C^s$ smooth functions. We also show that PDIR achieves the minimax optimal convergence rate and has a robustness property in the sense it is consistent with vanishing penalty parameters even when the monotonicity assumption is not satisfied. Furthermore, if the data distribution is supported on an approximate low-dimensional manifold, we show that DSME and PDIR can mitigate the curse of dimensionality.

View on arXiv PDF

Similar