Library network, a possible path to explainable neural networks
This addresses the need for explainable and trustworthy DNNs in high-stakes applications, but appears incremental as it builds on existing concerns without claiming major breakthroughs.
The paper tackles the problem of lack of transparency and vulnerability to adversarial attacks in deep neural networks (DNNs) by proposing an algorithm that traces decision processes across layers and detects attacks, with empirical evaluations showing effectiveness.
Deep neural networks (DNNs) may outperform human brains in complex tasks, but the lack of transparency in their decision-making processes makes us question whether we could fully trust DNNs with high stakes problems. As DNNs' operations rely on a massive number of both parallel and sequential linear/nonlinear computations, predicting their mistakes is nearly impossible. Also, a line of studies suggests that DNNs can be easily deceived by adversarial attacks, indicating that their decisions can easily be corrupted by unexpected factors. Such vulnerability must be overcome if we intend to take advantage of DNNs' efficiency in high stakes problems. Here, we propose an algorithm that can help us better understand DNNs' decision-making processes. Our empirical evaluations suggest that this algorithm can effectively trace DNNs' decision processes from one layer to another and detect adversarial attacks.