Information Removal at the bottleneck in Deep Neural Networks
This addresses privacy and bias issues in deep learning by enabling information removal, though it appears incremental as it builds on existing methods for feature control.
The authors tackled the problem of controlling the propagation of specific features like gender or race in deep neural networks, proposing IRENE to minimize mutual information between private features and the target, with experiments on synthetic data and CelebA validating its effectiveness.
Deep learning models are nowadays broadly deployed to solve an incredibly large variety of tasks. Commonly, leveraging over the availability of "big data", deep neural networks are trained as black-boxes, minimizing an objective function at its output. This however does not allow control over the propagation of some specific features through the model, like gender or race, for solving some an uncorrelated task. This raises issues either in the privacy domain (considering the propagation of unwanted information) and of bias (considering that these features are potentially used to solve the given task). In this work we propose IRENE, a method to achieve information removal at the bottleneck of deep neural networks, which explicitly minimizes the estimated mutual information between the features to be kept ``private'' and the target. Experiments on a synthetic dataset and on CelebA validate the effectiveness of the proposed approach, and open the road towards the development of approaches guaranteeing information removal in deep neural networks.