SP AP MLJun 8, 2020

Interpretable Classification of Bacterial Raman Spectra with Knockoff Wavelets

Charmaine Chia, Matteo Sesia, Chi-Sing Ho, Stefanie S. Jeffrey, Jennifer Dionne, Emmanuel J. Candès, Roger T. Howe

arXiv:2006.04937v36 citations

Originality Synthesis-oriented

AI Analysis

This work addresses the need for interpretable models in high-stakes biomedical decisions, such as bacterial infection identification, though it is incremental in applying existing methods to a specific domain.

The paper tackled the problem of interpreting complex machine learning models for biomedical signal data by proposing a logistic regression model with wavelet features and knockoff variable selection for bacterial Raman spectra classification, achieving accuracy comparable to neural networks while being simpler and more transparent.

Deep neural networks and other sophisticated machine learning models are widely applied to biomedical signal data because they can detect complex patterns and compute accurate predictions. However, the difficulty of interpreting such models is a limitation, especially for applications involving high-stakes decision, including the identification of bacterial infections. In this paper, we consider fast Raman spectroscopy data and demonstrate that a logistic regression model with carefully selected features achieves accuracy comparable to that of neural networks, while being much simpler and more transparent. Our analysis leverages wavelet features with intuitive chemical interpretations, and performs controlled variable selection with knockoffs to ensure the predictors are relevant and non-redundant. Although we focus on a particular data set, the proposed approach is broadly applicable to other types of signal data for which interpretability may be important.

View on arXiv PDF

Similar