On the estimation of correlation in a binary sequence model
This work addresses a fundamental statistical estimation problem for researchers in binary data modeling, revealing a phase transition in estimability based on data discretization.
The paper tackles the problem of estimating a common correlation parameter from binary sequences generated by thresholding hidden continuous variables, and finds that maximum likelihood estimation fails to provide consistent estimates, while trinary data can achieve consistent estimation with parametric convergence rates.
We consider a binary sequence generated by thresholding a hidden continuous sequence. The hidden variables are assumed to have a compound symmetry covariance structure with a single parameter characterizing the common correlation. We study the parameter estimation problem under such one-parameter models. We demonstrate that maximizing the likelihood function does not yield consistent estimates for the correlation. We then formally prove the nonestimability of the parameter by deriving a non-vanishing minimax lower bound. This counter-intuitive phenomenon provides an interesting insight that one-bit information of each latent variable is not sufficient to consistently recover their common correlation. On the other hand, we further show that trinary data generated from the hidden variables can consistently estimate the correlation with parametric convergence rate. Thus we reveal a phase transition phenomenon regarding the discretization of latent continuous variables while preserving the estimability of the correlation. Numerical experiments are performed to validate the conclusions.