Double Machine Learning for Partially Linear Mixed-Effects Models with Repeated Measurements
This work provides a more flexible and efficient method for statisticians and researchers analyzing longitudinal data with complex interactions, though it is incremental as it builds on existing double machine learning techniques.
The authors tackled the problem of estimating fixed effects in partially linear mixed-effects models with repeated measurements by using double machine learning to nonparametrically adjust for nonlinear variables, proving that the estimated coefficient is asymptotically Gaussian and semiparametrically efficient. Simulation studies showed their method outperforms a penalized regression spline approach in coverage, and they applied it to a longitudinal HIV dataset.
Traditionally, spline or kernel approaches in combination with parametric estimation are used to infer the linear coefficient (fixed effects) in a partially linear mixed-effects model for repeated measurements. Using machine learning algorithms allows us to incorporate complex interaction structures and high-dimensional variables. We employ double machine learning to cope with the nonparametric part of the partially linear mixed-effects model: the nonlinear variables are regressed out nonparametrically from both the linear variables and the response. This adjustment can be performed with any machine learning algorithm, for instance random forests, which allows to take complex interaction terms and nonsmooth structures into account. The adjusted variables satisfy a linear mixed-effects model, where the linear coefficient can be estimated with standard linear mixed-effects techniques. We prove that the estimated fixed effects coefficient converges at the parametric rate, is asymptotically Gaussian distributed, and semiparametrically efficient. Two simulation studies demonstrate that our method outperforms a penalized regression spline approach in terms of coverage. We also illustrate our proposed approach on a longitudinal dataset with HIV-infected individuals. Software code for our method is available in the R-package dmlalg.