ML IT STMay 9, 2016

Inference of High-dimensional Autoregressive Generalized Linear Models

Eric C. Hall, Garvesh Raskutti, Rebecca Willett

arXiv:1605.02693v210.311 citations

Originality Incremental advance

AI Analysis

This addresses the lack of statistical guarantees for network inference in non-Gaussian settings, which is incremental but important for applications in social, epidemiological, financial, or biological networks.

The paper tackles the problem of inferring autoregressive parameters and network structure in non-Gaussian time series, such as Poisson and Bernoulli processes, by deriving sample complexity bounds for a sparsity-regularized maximum likelihood estimator, supported by simulation studies that characterize the impact of network parameters on performance.

Vector autoregressive models characterize a variety of time series in which linear combinations of current and past observations can be used to accurately predict future observations. For instance, each element of an observation vector could correspond to a different node in a network, and the parameters of an autoregressive model would correspond to the impact of the network structure on the time series evolution. Often these models are used successfully in practice to learn the structure of social, epidemiological, financial, or biological neural networks. However, little is known about statistical guarantees on estimates of such models in non-Gaussian settings. This paper addresses the inference of the autoregressive parameters and associated network structure within a generalized linear model framework that includes Poisson and Bernoulli autoregressive processes. At the heart of this analysis is a sparsity-regularized maximum likelihood estimator. While sparsity-regularization is well-studied in the statistics and machine learning communities, those analysis methods cannot be applied to autoregressive generalized linear models because of the correlations and potential heteroscedasticity inherent in the observations. Sample complexity bounds are derived using a combination of martingale concentration inequalities and modern empirical process techniques for dependent random variables. These bounds, which are supported by several simulation studies, characterize the impact of various network parameters on estimator performance.

View on arXiv PDF

Similar