CLDec 21, 2018

NeuroX: A Toolkit for Analyzing Individual Neurons in Neural Networks

Fahim Dalvi, Avery Nortonsmith, D. Anthony Bau, Yonatan Belinkov, Hassan Sajjad, Nadir Durrani, James Glass

arXiv:1812.09359v14.354 citationsHas Code

Originality Synthesis-oriented

AI Analysis

This toolkit addresses the need for better interpretability in neural networks for researchers and practitioners, though it is incremental as it builds on existing analysis methods.

The authors tackled the problem of interpreting neural networks by developing NeuroX, a toolkit for analyzing individual neurons, which enables users to identify salient neurons, visualize them, ablate them to measure accuracy effects, and manipulate them to control model behavior.

We present a toolkit to facilitate the interpretation and understanding of neural network models. The toolkit provides several methods to identify salient neurons with respect to the model itself or an external task. A user can visualize selected neurons, ablate them to measure their effect on the model accuracy, and manipulate them to control the behavior of the model at the test time. Such an analysis has a potential to serve as a springboard in various research directions, such as understanding the model, better architectural choices, model distillation and controlling data biases.

View on arXiv PDF Code

Similar