ME AP MLSep 10, 2021

Interaction Models and Generalized Score Matching for Compositional Data

arXiv:2109.04671v11.2

Originality Incremental advance

AI Analysis

This work addresses the need for statistical methods to model interactions in compositional data, which is important for applications like microbiome analysis, and it is incremental in extending existing techniques to handle the simplex domain.

The authors tackled the problem of modeling interactions in compositional data, such as microbiome data, by proposing a class of exponential family models on the probability simplex, and they developed effective estimation methods using generalized score matching, achieving high-dimensional efficiency comparable to full-dimensional domains.

Applications such as the analysis of microbiome data have led to renewed interest in statistical methods for compositional data, i.e., multivariate data in the form of probability vectors that contain relative proportions. In particular, there is considerable interest in modeling interactions among such relative proportions. To this end we propose a class of exponential family models that accommodate general patterns of pairwise interaction while being supported on the probability simplex. Special cases include the family of Dirichlet distributions as well as Aitchison's additive logistic normal distributions. Generally, the distributions we consider have a density that features a difficult to compute normalizing constant. To circumvent this issue, we design effective estimation methods based on generalized versions of score matching. A high-dimensional analysis of our estimation methods shows that the simplex domain is handled as efficiently as previously studied full-dimensional domains.

View on arXiv PDF

Similar