LGAINEMar 14, 2024

Towards White Box Deep Learning

arXiv:2403.09863v51 citationsHas Code
Originality Incremental advance
AI Analysis

This addresses the interpretability and robustness issues in deep learning, which is crucial for deploying AI in safety-critical domains, though it appears to be a proof-of-concept rather than a broad SOTA advancement.

The paper tackles the problem of deep neural networks being black boxes and vulnerable to adversarial attacks by proposing semantic features as an architectural solution, resulting in a lightweight, interpretable network that achieves near-human-level adversarial test metrics without adversarial training.

Deep neural networks learn fragile "shortcut" features, rendering them difficult to interpret (black box) and vulnerable to adversarial attacks. This paper proposes semantic features as a general architectural solution to this problem. The main idea is to make features locality-sensitive in the adequate semantic topology of the domain, thus introducing a strong regularization. The proof of concept network is lightweight, inherently interpretable and achieves almost human-level adversarial test metrics - with no adversarial training! These results and the general nature of the approach warrant further research on semantic features. The code is available at https://github.com/314-Foundation/white-box-nn

Code Implementations1 repo
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes