STMLNov 2, 2014

A General Framework for Mixed Graphical Models

arXiv:1411.0288v121 citations
Originality Incremental advance
AI Analysis

This addresses a foundational gap in statistical modeling for fields like genomics and proteomics, where mixed data are common but lack joint modeling frameworks, though it is incremental in building on existing node-conditional methods.

The authors tackled the problem of jointly modeling mixed data types (e.g., count, binary, continuous) by introducing Block Directed Markov Random Fields (BDMRFs), a novel class of models that combine directed and undirected edges to capture dependencies, with simulations and genomic applications demonstrating their versatility.

"Mixed Data" comprising a large number of heterogeneous variables (e.g. count, binary, continuous, skewed continuous, among other data types) are prevalent in varied areas such as genomics and proteomics, imaging genetics, national security, social networking, and Internet advertising. There have been limited efforts at statistically modeling such mixed data jointly, in part because of the lack of computationally amenable multivariate distributions that can capture direct dependencies between such mixed variables of different types. In this paper, we address this by introducing a novel class of Block Directed Markov Random Fields (BDMRFs). Using the basic building block of node-conditional univariate exponential families from Yang et al. (2012), we introduce a class of mixed conditional random field distributions, that are then chained according to a block-directed acyclic graph to form our class of Block Directed Markov Random Fields (BDMRFs). The Markov independence graph structure underlying a BDMRF thus has both directed and undirected edges. We introduce conditions under which these distributions exist and are normalizable, study several instances of our models, and propose scalable penalized conditional likelihood estimators with statistical guarantees for recovering the underlying network structure. Simulations as well as an application to learning mixed genomic networks from next generation sequencing expression data and mutation data demonstrate the versatility of our methods.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes