Active learning of digenic functions with boolean matrix logic programming
This work addresses computational and empirical challenges in cellular engineering and biological discovery for microbial engineering, offering a realistic approach to a self-driving lab.
The paper tackles the problem of learning intricate genetic interactions in genome-scale metabolic network models (GEMs) by introducing Boolean Matrix Logic Programming (BMLP) with active learning, resulting in successful learning of gene pair interactions with fewer training examples than random experimentation.
We apply logic-based machine learning techniques to facilitate cellular engineering and drive biological discovery, based on comprehensive databases of metabolic processes called genome-scale metabolic network models (GEMs). Predicted host behaviours are not always correctly described by GEMs. Learning the intricate genetic interactions within GEMs presents computational and empirical challenges. To address these, we describe a novel approach called Boolean Matrix Logic Programming (BMLP) by leveraging boolean matrices to evaluate large logic programs. We introduce a new system, $BMLP_{active}$, which efficiently explores the genomic hypothesis space by guiding informative experimentation through active learning. In contrast to sub-symbolic methods, $BMLP_{active}$ encodes a state-of-the-art GEM of a widely accepted bacterial host in an interpretable and logical representation using datalog logic programs. Notably, $BMLP_{active}$ can successfully learn the interaction between a gene pair with fewer training examples than random experimentation, overcoming the increase in experimental design space. $BMLP_{active}$ enables rapid optimisation of metabolic models and offers a realistic approach to a self-driving lab for microbial engineering.