ST CO ME MLApr 25, 2016

Learning Local Dependence In Ordered Data

arXiv:1604.07451v37.336 citations

Originality Highly original

AI Analysis

This work addresses the challenge of modeling varying local dependence in ordered data, such as genomic or sound recording data, offering a flexible and efficient solution with applications in fields like bioinformatics and classification.

The authors tackled the problem of learning local dependence in ordered data by proposing a penalized maximum likelihood framework that estimates the inverse Cholesky factor of the covariance matrix, resulting in a sparse, symmetric, positive definite estimator with theoretical guarantees and favorable empirical performance compared to existing methods.

In many applications, data come with a natural ordering. This ordering can often induce local dependence among nearby variables. However, in complex data, the width of this dependence may vary, making simple assumptions such as a constant neighborhood size unrealistic. We propose a framework for learning this local dependence based on estimating the inverse of the Cholesky factor of the covariance matrix. Penalized maximum likelihood estimation of this matrix yields a simple regression interpretation for local dependence in which variables are predicted by their neighbors. Our proposed method involves solving a convex, penalized Gaussian likelihood problem with a hierarchical group lasso penalty. The problem decomposes into independent subproblems which can be solved efficiently in parallel using first-order methods. Our method yields a sparse, symmetric, positive definite estimator of the precision matrix, encoding a Gaussian graphical model. We derive theoretical results not found in existing methods attaining this structure. In particular, our conditions for signed support recovery and estimation consistency rates in multiple norms are as mild as those in a regression problem. Empirical results show our method performing favorably compared to existing methods. We apply our method to genomic data to flexibly model linkage disequilibrium. Our method is also applied to improve the performance of discriminant analysis in sound recording classification.

View on arXiv PDF

Similar