Modeling cumulative biological phenomena with Suppes-Bayes Causal Networks
This work addresses the challenge of understanding disease progression for researchers in computational biology, but it is incremental as it builds on existing methods with theoretical refinements.
The paper tackles the problem of modeling the accumulation of DNA changes in diseases like cancer and HIV by introducing Suppes-Bayes Causal Networks (SBCNs), a framework that infers the ordering of mutations from data using Bayesian inference and regularization strategies, with an application to HIV data demonstrating its utility.
Several diseases related to cell proliferation are characterized by the accumulation of somatic DNA changes, with respect to wildtype conditions. Cancer and HIV are two common examples of such diseases, where the mutational load in the cancerous/viral population increases over time. In these cases, selective pressures are often observed along with competition, cooperation and parasitism among distinct cellular clones. Recently, we presented a mathematical framework to model these phenomena, based on a combination of Bayesian inference and Suppes' theory of probabilistic causation, depicted in graphical structures dubbed Suppes-Bayes Causal Networks (SBCNs). SBCNs are generative probabilistic graphical models that recapitulate the potential ordering of accumulation of such DNA changes during the progression of the disease. Such models can be inferred from data by exploiting likelihood-based model-selection strategies with regularization. In this paper we discuss the theoretical foundations of our approach and we investigate in depth the influence on the model-selection task of: (i) the poset based on Suppes' theory and (ii) different regularization strategies. Furthermore, we provide an example of application of our framework to HIV genetic data highlighting the valuable insights provided by the inferred.