LG COMar 18, 2025

Learning local neighborhoods of non-Gaussian graphical models: A measure transport approach

Sarah Liaw, Rebecca Morrison, Youssef Marzouk, Ricardo Baptista

arXiv:2503.13899v14.11 citationsh-index: 15Has CodeAAAI

Originality Incremental advance

AI Analysis

This work addresses the scalability and flexibility limitations in graphical model learning for statisticians and data scientists, offering an incremental improvement over existing methods like Lasso-based neighborhood selection.

The paper tackled the problem of identifying conditional independence relationships in high-dimensional non-Gaussian graphical models by proposing the L-SING algorithm, which uses transport maps for scalable local neighborhood estimation and demonstrated effectiveness in Gaussian and non-Gaussian settings, including a biological dataset with over 150 variables.

Identifying the Markov properties or conditional independencies of a collection of random variables is a fundamental task in statistics for modeling and inference. Existing approaches often learn the structure of a probabilistic graphical model, which encodes these dependencies, by assuming that the variables follow a distribution with a simple parametric form. Moreover, the computational cost of many algorithms scales poorly for high-dimensional distributions, as they need to estimate all the edges in the graph simultaneously. In this work, we propose a scalable algorithm to infer the conditional independence relationships of each variable by exploiting the local Markov property. The proposed method, named Localized Sparsity Identification for Non-Gaussian Distributions (L-SING), estimates the graph by using flexible classes of transport maps to represent the conditional distribution for each variable. We show that L-SING includes existing approaches, such as neighborhood selection with Lasso, as a special case. We demonstrate the effectiveness of our algorithm in both Gaussian and non-Gaussian settings by comparing it to existing methods. Lastly, we show the scalability of the proposed approach by applying it to high-dimensional non-Gaussian examples, including a biological dataset with more than 150 variables.

View on arXiv PDF Code

Similar