Scalable Algorithms for Learning High-Dimensional Linear Mixed Models
This addresses scalability issues for researchers and practitioners using LMMs in fields like genetics or statistics, representing a strong specific gain rather than a foundational breakthrough.
The paper tackles the computational inefficiency of learning high-dimensional linear mixed models (LMMs) on big data, achieving sublinear computational complexity in the covariate dimension with theoretical guarantees and experimental validation.
Linear mixed models (LMMs) are used extensively to model dependecies of observations in linear regression and are used extensively in many application areas. Parameter estimation for LMMs can be computationally prohibitive on big data. State-of-the-art learning algorithms require computational complexity which depends at least linearly on the dimension $p$ of the covariates, and often use heuristics that do not offer theoretical guarantees. We present scalable algorithms for learning high-dimensional LMMs with sublinear computational complexity dependence on $p$. Key to our approach are novel dual estimators which use only kernel functions of the data, and fast computational techniques based on the subsampled randomized Hadamard transform. We provide theoretical guarantees for our learning algorithms, demonstrating the robustness of parameter estimation. Finally, we complement the theory with experiments on large synthetic and real data.