Statistical Estimation from Dependent Data
This addresses the challenge of handling dependent data in statistical estimation for applications like social network analysis or spatial modeling, offering a novel algorithmic approach with practical gains.
The paper tackles the problem of statistical estimation from dependent data, such as spatial, temporal, or networked observations, by modeling dependencies with Markov Random Fields and providing algorithms with statistically efficient estimation rates. It demonstrates improved performance over standard regression methods on real networked text classification datasets like Cora, Citeseer, and Pubmed.
We consider a general statistical estimation problem wherein binary labels across different observations are not independent conditioned on their feature vectors, but dependent, capturing settings where e.g. these observations are collected on a spatial domain, a temporal domain, or a social network, which induce dependencies. We model these dependencies in the language of Markov Random Fields and, importantly, allow these dependencies to be substantial, i.e do not assume that the Markov Random Field capturing these dependencies is in high temperature. As our main contribution we provide algorithms and statistically efficient estimation rates for this model, giving several instantiations of our bounds in logistic regression, sparse logistic regression, and neural network settings with dependent data. Our estimation guarantees follow from novel results for estimating the parameters (i.e. external fields and interaction strengths) of Ising models from a {\em single} sample. {We evaluate our estimation approach on real networked data, showing that it outperforms standard regression approaches that ignore dependencies, across three text classification datasets: Cora, Citeseer and Pubmed.}