LG ST MLMay 8, 2019

Regression from Dependent Observations

Constantinos Daskalakis, Nishanth Dikkala, Ioannis Panageas

arXiv:1905.03353v215.835 citations

Originality Highly original

AI Analysis

This addresses a critical limitation in regression models for applications like social networks or finance where data dependencies are common, providing a foundational improvement over existing methods that require multiple independent samples.

The paper tackles the problem of linear and logistic regression when response variables are dependent, such as in network data, by presenting computationally and statistically efficient methods that achieve strong consistency for coefficient recovery under mild assumptions, matching the rates of standard regression with independent observations.

The standard linear and logistic regression models assume that the response variables are independent, but share the same linear relationship to their corresponding vectors of covariates. The assumption that the response variables are independent is, however, too strong. In many applications, these responses are collected on nodes of a network, or some spatial or temporal domain, and are dependent. Examples abound in financial and meteorological applications, and dependencies naturally arise in social networks through peer effects. Regression with dependent responses has thus received a lot of attention in the Statistics and Economics literature, but there are no strong consistency results unless multiple independent samples of the vectors of dependent responses can be collected from these models. We present computationally and statistically efficient methods for linear and logistic regression models when the response variables are dependent on a network. Given one sample from a networked linear or logistic regression model and under mild assumptions, we prove strong consistency results for recovering the vector of coefficients and the strength of the dependencies, recovering the rates of standard regression under independent observations. We use projected gradient descent on the negative log-likelihood, or negative log-pseudolikelihood, and establish their strong convexity and consistency using concentration of measure for dependent random variables.

View on arXiv PDF

Similar