Learning interpretable models of phenotypes from whole genome sequences with the Set Covering Machine
This work addresses the need for interpretable models in genomics, particularly for predicting antibiotic resistance in a human pathogen, though it appears incremental as it applies an existing method to a new domain.
The authors tackled the problem of learning interpretable models for discrete phenotypes from whole genome sequences, using the Set Covering Machine with a k-mer representation, and demonstrated that extremely sparse and biologically relevant models can be learned, as shown in predicting antibiotic resistance in Pseudomonas aeruginosa for 4 antibiotics.
The increased affordability of whole genome sequencing has motivated its use for phenotypic studies. We address the problem of learning interpretable models for discrete phenotypes from whole genomes. We propose a general approach that relies on the Set Covering Machine and a k-mer representation of the genomes. We show results for the problem of predicting the resistance of Pseudomonas Aeruginosa, an important human pathogen, against 4 antibiotics. Our results demonstrate that extremely sparse models which are biologically relevant can be learnt using this approach.