Fréchet regression of multivariate distributions with nonparanormal transport
This work provides a novel method for regressing multivariate distributions, which is important for researchers and practitioners working with complex distributional data, particularly in fields like health monitoring.
This paper introduces a new regression approach for multivariate distribution-valued responses using Euclidean predictors. It addresses challenges in modeling multivariate distributions by employing the semiparametric nonparanormal family and the nonparanormal transport (NPT) metric, which allows for efficient decomposition into separate regressions of marginals and dependence structure.
Regression with distribution-valued responses and Euclidean predictors has gained increasing scientific relevance. While methodology for univariate distributional data has advanced rapidly in recent years, multivariate distributions, which additionally encode dependence across univariate marginals, have received less attention and pose computational and statistical challenges. In this work, we address these challenges with a new regression approach for multivariate distributional responses, in which distributions are modeled within the semiparametric nonparanormal family. By incorporating the nonparanormal transport (NPT) metric -- an efficient closed-form surrogate for the Wasserstein distance -- into the Fréchet regression framework, our approach decomposes the problem into separate regressions of marginal distributions and their dependence structure, facilitating both efficient estimation and granular interpretation of predictor effects. We provide theoretical justification for NPT, establishing its topological equivalence to the Wasserstein distance and proving that it mitigates the curse of dimensionality. We further prove uniform convergence guarantees for regression estimators, both when distributional responses are fully observed and when they are estimated from empirical samples, attaining fast convergence rates comparable to the univariate case. The utility of our method is demonstrated via simulations and an application to continuous glucose monitoring data.