Machine Learning Methods for Small Data and Upstream Bioprocessing Applications: A Comprehensive Review
It addresses the challenge of limited data in resource-intensive biopharmaceutical processes, offering guidance for practitioners, but is incremental as a review.
This review tackles the problem of applying machine learning in data-scarce settings like upstream bioprocessing by exploring and classifying methods designed for small data, evaluating their effectiveness based on application results.
Data is crucial for machine learning (ML) applications, yet acquiring large datasets can be costly and time-consuming, especially in complex, resource-intensive fields like biopharmaceuticals. A key process in this industry is upstream bioprocessing, where living cells are cultivated and optimised to produce therapeutic proteins and biologics. The intricate nature of these processes, combined with high resource demands, often limits data collection, resulting in smaller datasets. This comprehensive review explores ML methods designed to address the challenges posed by small data and classifies them into a taxonomy to guide practical applications. Furthermore, each method in the taxonomy was thoroughly analysed, with a detailed discussion of its core concepts and an evaluation of its effectiveness in tackling small data challenges, as demonstrated by application results in the upstream bioprocessing and other related domains. By analysing how these methods tackle small data challenges from different perspectives, this review provides actionable insights, identifies current research gaps, and offers guidance for leveraging ML in data-constrained environments.