LG MLNov 30, 2019

Error-Correcting Output Codes with Ensemble Diversity for Robust Learning in Neural Networks

arXiv:1912.00181v47.123 citations

Originality Incremental advance

AI Analysis

This work addresses the problem of adversarial robustness in neural networks for security-critical applications, offering a complementary defense approach that is incremental to existing methods.

The paper tackles the vulnerability of deep learning to adversarial examples by proposing an error-correcting neural network (ECNN) that combines binary classifiers with maximized diversity and fault tolerance, achieving effectiveness against state-of-the-art attacks on multiple datasets while maintaining good accuracy on normal examples.

Though deep learning has been applied successfully in many scenarios, malicious inputs with human-imperceptible perturbations can make it vulnerable in real applications. This paper proposes an error-correcting neural network (ECNN) that combines a set of binary classifiers to combat adversarial examples in the multi-class classification problem. To build an ECNN, we propose to design a code matrix so that the minimum Hamming distance between any two rows (i.e., two codewords) and the minimum shared information distance between any two columns (i.e., two partitions of class labels) are simultaneously maximized. Maximizing row distances can increase the system fault tolerance while maximizing column distances helps increase the diversity between binary classifiers. We propose an end-to-end training method for our ECNN, which allows further improvement of the diversity between binary classifiers. The end-to-end training renders our proposed ECNN different from the traditional error-correcting output code (ECOC) based methods that train binary classifiers independently. ECNN is complementary to other existing defense approaches such as adversarial training and can be applied in conjunction with them. We empirically demonstrate that our proposed ECNN is effective against the state-of-the-art white-box and black-box attacks on several datasets while maintaining good classification accuracy on normal examples.

View on arXiv PDF

Similar