A non-discriminatory approach to ethical deep learning
This addresses ethical concerns in AI for applications like image classification, though it is incremental as it builds on existing regularization methods.
The paper tackles the problem of discriminatory features in deep learning models by proposing NDR, a non-discriminatory regularization strategy that hides discriminatory information to prevent models from using such features, achieving minimal computational overhead and performance loss.
Artificial neural networks perform state-of-the-art in an ever-growing number of tasks, nowadays they are used to solve an incredibly large variety of tasks. However, typical training strategies do not take into account lawful, ethical and discriminatory potential issues the trained ANN models could incur in. In this work we propose NDR, a non-discriminatory regularization strategy to prevent the ANN model to solve the target task using some discriminatory features like, for example, the ethnicity in an image classification task for human faces. In particular, a part of the ANN model is trained to hide the discriminatory information such that the rest of the network focuses in learning the given learning task. Our experiments show that NDR can be exploited to achieve non-discriminatory models with both minimal computational overhead and performance loss.