SimbaML: Connecting Mechanistic Models and Machine Learning with Augmented Data
This tool addresses data scarcity issues for researchers and practitioners in fields where mechanistic knowledge is available, but it is incremental as it builds on existing simulation and ML integration concepts.
The authors tackled the problem of limited real-world data for training machine learning models by developing SimbaML, an open-source tool that generates synthetic datasets from mechanistic models and integrates them into ML pipelines, enabling tasks like transfer learning and data augmentation.
Training sophisticated machine learning (ML) models requires large datasets that are difficult or expensive to collect for many applications. If prior knowledge about system dynamics is available, mechanistic representations can be used to supplement real-world data. We present SimbaML (Simulation-Based ML), an open-source tool that unifies realistic synthetic dataset generation from ordinary differential equation-based models and the direct analysis and inclusion in ML pipelines. SimbaML conveniently enables investigating transfer learning from synthetic to real-world data, data augmentation, identifying needs for data collection, and benchmarking physics-informed ML approaches. SimbaML is available from https://pypi.org/project/simba-ml/.