LG MLMay 11, 2021

Leveraging Sparse Linear Layers for Debuggable Deep Networks

Eric Wong, Shibani Santurkar, Aleksander Mądry

arXiv:2105.04857v128.0102 citationsHas Code

Originality Incremental advance

AI Analysis

This addresses the need for more interpretable AI models in vision and language tasks, though it is incremental as it builds on existing sparse linear methods.

The paper tackles the problem of making deep networks more debuggable by fitting sparse linear models over learned features, resulting in networks that maintain high accuracy while improving interpretability, as demonstrated through numerical and human experiments.

We show how fitting sparse linear models over learned deep feature representations can lead to more debuggable neural networks. These networks remain highly accurate while also being more amenable to human interpretation, as we demonstrate quantiatively via numerical and human experiments. We further illustrate how the resulting sparse explanations can help to identify spurious correlations, explain misclassifications, and diagnose model biases in vision and language tasks. The code for our toolkit can be found at https://github.com/madrylab/debuggabledeepnetworks.

View on arXiv PDF Code

Similar