Machine Learning for Antimicrobial Resistance
This work addresses the underrepresentation of biological datasets in the Data-for-Good community, focusing on AMR, a global health issue, but is incremental as it uses existing methods on new data.
The study applied standard ensemble machine learning techniques to antimicrobial resistance (AMR) data, achieving classification accuracies from mid-90% to low-80% depending on sample size and successfully identifying gene regions associated with AMR.
Biological datasets amenable to applied machine learning are more available today than ever before, yet they lack adequate representation in the Data-for-Good community. Here we present a work in progress case study performing analysis on antimicrobial resistance (AMR) using standard ensemble machine learning techniques and note the successes and pitfalls such work entails. Broadly, applied machine learning (AML) techniques are well suited to AMR, with classification accuracies ranging from mid-90% to low- 80% depending on sample size. Additionally, these techniques prove successful at identifying gene regions known to be associated with the AMR phenotype. We believe that the extensive amount of biological data available, the plethora of problems presented, and the global impact of such work merits the consideration of the Data- for-Good community.