LGMar 5

Count Bridges enable Modeling and Deconvolving Transcriptomic Data

arXiv:2603.04730v1
Originality Highly original
AI Analysis

This work provides a principled generative modeling and deconvolution framework for biological count data, which is crucial for researchers working with RNA sequencing and similar assays.

This paper introduces Count Bridges, a stochastic bridge process designed for integer-valued count data, offering an exact and tractable alternative to diffusion models. It achieves state-of-the-art performance on integer distribution matching benchmarks and is applied to modeling single-cell gene expression and deconvolving bulk RNA-seq and multicellular spatial transcriptomic data.

Many modern biological assays, including RNA sequencing, yield integer-valued counts that reflect the number of molecules detected. These measurements are often not at the desired resolution: while the unit of interest is typically a single cell, many measurement technologies produce counts aggregated over sets of cells. Although recent generative frameworks such as diffusion and flow matching have been extended to non-Euclidean and discrete settings, it remains unclear how best to model integer-valued data or how to systematically deconvolve aggregated observations. We introduce Count Bridges, a stochastic bridge process on the integers that provides an exact, tractable analogue of diffusion-style models for count data, with closed-form conditionals for efficient training and sampling. We extend this framework to enable direct training from aggregated measurements via an Expectation-Maximization-style approach that treats unit-level counts as latent variables. We demonstrate state-of-the-art performance on integer distribution matching benchmarks, comparing against flow matching and discrete flow matching baselines across various metrics. We then apply Count Bridges to two large-scale problems in biology: modeling single-cell gene expression data at the nucleotide resolution, with applications to deconvolving bulk RNA-seq, and resolving multicellular spatial transcriptomic spots into single-cell count profiles. Our methods offer a principled foundation for generative modeling and deconvolution of biological count data across scales and modalities.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes