Poisson Hierarchical Indian Buffet Processes-With Indications for Microbiome Species Sampling Models
This provides a flexible multivariate count model for microbiome data analysis, with broader applications in genetics, commerce, and text analysis.
The authors introduced the Poisson Hierarchical Indian Buffet Process (PHIBP) to model complex, sparse count data by enabling information sharing across groups, with applications in microbiome analysis. The framework provides tractable Bayesian inference, exact generative sampling, and addresses the unseen species problem.
We introduce the Poisson Hierarchical Indian Buffet Process (PHIBP), a new class of species sampling models designed to address the challenges of complex, sparse count data by facilitating information sharing across and within groups. Our theoretical developments enable a tractable Bayesian nonparametric framework with machine learning elements, accommodating a potentially infinite number of species (taxa) whose parameters are learned from data. Focusing on microbiome analysis, we address key gaps by providing a flexible multivariate count model that accounts for overdispersion and robustly handles diverse data types (OTUs, ASVs). We introduce novel parameters reflecting species abundance and diversity. The model borrows strength across groups while explicitly distinguishing between technical and biological zeros to interpret sparse co-occurrence patterns. This results in a framework with tractable posterior inference, exact generative sampling, and a principled solution to the unseen species problem. We describe extensions where domain experts can incorporate knowledge through covariates and structured priors, with potential for strain-level analysis. While motivated by ecology, our work provides a broadly applicable methodology for hierarchical count modeling in genetics, commerce, and text analysis, and has significant implications for the broader theory of species sampling models arising in probability and statistics.