MLLGGNAPJun 12, 2021

Doubly Non-Central Beta Matrix Factorization for DNA Methylation Data

arXiv:2106.06691v14 citations
Originality Incremental advance
AI Analysis

This work addresses the need for better latent representations of bounded-support data, particularly for DNA methylation analysis in bioinformatics, though it is incremental as it builds on existing matrix factorization techniques.

The authors tackled the problem of modeling DNA methylation data, which is bounded between 0 and 1, by introducing a new non-negative matrix factorization model based on the doubly non-central beta distribution, resulting in improved out-of-sample predictive performance over state-of-the-art methods in bioinformatics.

We present a new non-negative matrix factorization model for $(0,1)$ bounded-support data based on the doubly non-central beta (DNCB) distribution, a generalization of the beta distribution. The expressiveness of the DNCB distribution is particularly useful for modeling DNA methylation datasets, which are typically highly dispersed and multi-modal; however, the model structure is sufficiently general that it can be adapted to many other domains where latent representations of $(0,1)$ bounded-support data are of interest. Although the DNCB distribution lacks a closed-form conjugate prior, several augmentations let us derive an efficient posterior inference algorithm composed entirely of analytic updates. Our model improves out-of-sample predictive performance on both real and synthetic DNA methylation datasets over state-of-the-art methods in bioinformatics. In addition, our model yields meaningful latent representations that accord with existing biological knowledge.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes