LGApr 19, 2021

Metadata Normalization

arXiv:2104.09052v221 citations
Originality Incremental advance
AI Analysis

This addresses bias and confounding effects in models, such as race in gender classification, but is incremental as it builds on batch normalization techniques.

The paper tackles the problem of bias from extraneous variables (metadata) in deep learning features, which batch normalization does not address, by introducing Metadata Normalization (MDN) to regress out these effects during training, demonstrating successful removal on four diverse datasets.

Batch Normalization (BN) and its variants have delivered tremendous success in combating the covariate shift induced by the training step of deep learning methods. While these techniques normalize feature distributions by standardizing with batch statistics, they do not correct the influence on features from extraneous variables or multiple distributions. Such extra variables, referred to as metadata here, may create bias or confounding effects (e.g., race when classifying gender from face images). We introduce the Metadata Normalization (MDN) layer, a new batch-level operation which can be used end-to-end within the training framework, to correct the influence of metadata on feature distributions. MDN adopts a regression analysis technique traditionally used for preprocessing to remove (regress out) the metadata effects on model features during training. We utilize a metric based on distance correlation to quantify the distribution bias from the metadata and demonstrate that our method successfully removes metadata effects on four diverse settings: one synthetic, one 2D image, one video, and one 3D medical image dataset.

Code Implementations1 repo
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes