Optimal whitening and decorrelation
This work addresses a foundational issue in statistical preprocessing for researchers and practitioners, offering optimal whitening methods to improve data analysis.
The paper tackles the problem of selecting among infinitely many whitening transformations by proposing a method to break rotational invariance using cross-covariance and cross-correlation matrices, recommending ZCA-cor for maximal similarity and PCA-cor for maximal compression.
Whitening, or sphering, is a common preprocessing step in statistical analysis to transform random variables to orthogonality. However, due to rotational freedom there are infinitely many possible whitening procedures. Consequently, there is a diverse range of sphering methods in use, for example based on principal component analysis (PCA), Cholesky matrix decomposition and zero-phase component analysis (ZCA), among others. Here we provide an overview of the underlying theory and discuss five natural whitening procedures. Subsequently, we demonstrate that investigating the cross-covariance and the cross-correlation matrix between sphered and original variables allows to break the rotational invariance and to identify optimal whitening transformations. As a result we recommend two particular approaches: ZCA-cor whitening to produce sphered variables that are maximally similar to the original variables, and PCA-cor whitening to obtain sphered variables that maximally compress the original variables.