Optimized Preprocessing and Machine Learning for Quantitative Raman Spectroscopy in Biology
This work addresses the challenge of inconsistent and time-consuming preprocessing selection for Raman spectroscopy in biology, potentially aiding in noninvasive health monitoring like diabetes diagnosis, though it appears incremental as it builds on existing preprocessing methods.
The study tackled the problem of selecting optimal preprocessing methods for Raman spectroscopy in biofluid analysis by developing a statistical technique to analyze spectral variability, which improved predictive models for artificial biological fluids.
Raman spectroscopy's capability to provide meaningful composition predictions is heavily reliant on a pre-processing step to remove insignificant spectral variation. This is crucial in biofluid analysis. Widespread adoption of diagnostics using Raman requires a robust model which can withstand routine spectra discrepancies due to unavoidable variations such as age, diet, and medical background. A wealth of pre-processing methods are available, and it is often up to trial-and-error or user experience to select the method which gives the best results. This process can be incredibly time consuming and inconsistent for multiple operators. In this study we detail a method to analyze the statistical variability within a set of training spectra and determine suitability to form a robust model. This allows us to selectively qualify or exclude a pre-processing method, predetermine robustness, and simultaneously identify the number of components which will form the best predictive model. We demonstrate the ability of this technique to improve predictive models of two artificial biological fluids. Raman spectroscopy is ideal for noninvasive, nondestructive analysis. Routine health monitoring which maximizes comfort is increasingly crucial, particularly in epidemic-level diabetes diagnoses. High variability in spectra of biological samples can hinder Raman's adoption for these methods. Our technique allows the decision of optimal pre-treatment method to be determined for the operator; model performance is no longer a function of user experience. We foresee this statistical technique being an instrumental element to widening the adoption of Raman as a monitoring tool in a field of biofluid analysis.