LGMLOct 2, 2019

Reverse-Engineering Deep ReLU Networks

arXiv:1910.00744v2124 citations
AI Analysis

This addresses a fundamental challenge in interpretability and security for deep learning practitioners, offering a novel theoretical breakthrough rather than an incremental improvement.

The paper tackles the problem of recovering the internal parameters of an unknown deep ReLU network from its output, proving it is often possible to identify the architecture, weights, and biases by analyzing piecewise linear boundaries.

It has been widely assumed that a neural network cannot be recovered from its outputs, as the network depends on its parameters in a highly nonlinear way. Here, we prove that in fact it is often possible to identify the architecture, weights, and biases of an unknown deep ReLU network by observing only its output. Every ReLU network defines a piecewise linear function, where the boundaries between linear regions correspond to inputs for which some neuron in the network switches between inactive and active ReLU states. By dissecting the set of region boundaries into components associated with particular neurons, we show both theoretically and empirically that it is possible to recover the weights of neurons and their arrangement within the network, up to isomorphism.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes