ML LG MEMar 2, 2021

Significance tests of feature relevance for a black-box learner

arXiv:2103.04985v310.239 citationsHas Code

Originality Incremental advance

AI Analysis

This work addresses the need for interpretable and efficient significance testing in scientific applications using deep learning, though it is incremental as it builds on existing black-box testing methods.

The paper tackles the problem of significance testing for feature relevance in black-box deep neural networks, which is challenging due to unknown distributions and computational demands, by proposing one-split and two-split tests that relax assumptions and reduce complexity, demonstrating utility on simulated and real datasets with an accompanying Python library.

An exciting recent development is the uptake of deep neural networks in many scientific fields, where the main objective is outcome prediction with the black-box nature. Significance testing is promising to address the black-box issue and explore novel scientific insights and interpretation of the decision-making process based on a deep learning model. However, testing for a neural network poses a challenge because of its black-box nature and unknown limiting distributions of parameter estimates while existing methods require strong assumptions or excessive computation. In this article, we derive one-split and two-split tests relaxing the assumptions and computational complexity of existing black-box tests and extending to examine the significance of a collection of features of interest in a dataset of possibly a complex type such as an image. The one-split test estimates and evaluates a black-box model based on estimation and inference subsets through sample splitting and data perturbation. The two-split test further splits the inference subset into two but require no perturbation. Also, we develop their combined versions by aggregating the p-values based on repeated sample splitting. By deflating the bias-sd-ratio, we establish asymptotic null distributions of the test statistics and the consistency in terms of Type II error. Numerically, we demonstrate the utility of the proposed tests on seven simulated examples and six real datasets. Accompanying this paper is our Python library dnn-inference (https://dnn-inference.readthedocs.io/en/latest/) that implements the proposed tests.

View on arXiv PDF Code

Similar