Surfboard: Audio Feature Extraction for Modern Machine Learning
This work provides a tool for researchers in the clinical domain to facilitate audio analysis, but it is incremental as it builds on existing audio analysis packages.
The authors tackled the problem of audio feature extraction for machine learning by introducing Surfboard, an open-source Python library designed to address pain points in existing tools and integrate with modern frameworks, demonstrating its application on a Parkinson's disease classification task using the mPower dataset.
We introduce Surfboard, an open-source Python library for extracting audio features with application to the medical domain. Surfboard is written with the aim of addressing pain points of existing libraries and facilitating joint use with modern machine learning frameworks. The package can be accessed both programmatically in Python and via its command line interface, allowing it to be easily integrated within machine learning workflows. It builds on state-of-the-art audio analysis packages and offers multiprocessing support for processing large workloads. We review similar frameworks and describe Surfboard's architecture, including the clinical motivation for its features. Using the mPower dataset, we illustrate Surfboard's application to a Parkinson's disease classification task, highlighting common pitfalls in existing research. The source code is opened up to the research community to facilitate future audio research in the clinical domain.