Better Together: Cross and Joint Covariances Enhance Signal Detectability in Undersampled Data
This work addresses signal detection in data-science applications, offering incremental improvements in method selection for correlation analysis.
The paper tackles the problem of detecting shared signals between high-dimensional variables in undersampled data, showing that joint and cross covariance matrices reconstruct the signal earlier than self covariances, with performance depending on dimensionality mismatch.
Many data-science applications involve detecting a shared signal between two high-dimensional variables. Using random matrix theory methods, we determine when such signal can be detected and reconstructed from sample correlations, despite the background of sampling noise induced correlations. We consider three different covariance matrices constructed from two high-dimensional variables: their individual self covariance, their cross covariance, and the self covariance of the concatenated (joint) variable, which incorporates the self and the cross correlation blocks. We observe the expected Baik, Ben Arous, and Péché detectability phase transition in all these covariance matrices, and we show that joint and cross covariance matrices always reconstruct the shared signal earlier than the self covariances. Whether the joint or the cross approach is better depends on the mismatch of dimensionalities between the variables. We discuss what these observations mean for choosing the right method for detecting linear correlations in data and how these findings may generalize to nonlinear statistical dependencies.