Data-driven Approaches to Surrogate Machine Learning Model Development
This work addresses performance issues in surrogate models for the UK nuclear industry, but it is incremental as it adapts existing methods to a specific domain.
The authors tackled the problem of poor performance in surrogate machine learning models for nuclear engineering due to limited training data, achieving at least a 38% improvement in performance across five models by combining data augmentation, custom loss functions, and transfer learning.
We demonstrate the adaption of three established methods to the field of surrogate machine learning model development. These methods are data augmentation, custom loss functions and transfer learning. Each of these methods have seen widespread use in the field of machine learning, however, here we apply them specifically to surrogate machine learning model development. The machine learning model that forms the basis behind this work was intended to surrogate a traditional engineering model used in the UK nuclear industry. Previous performance of this model has been hampered by poor performance due to limited training data. Here, we demonstrate that through a combination of additional techniques, model performance can be significantly improved. We show that each of the aforementioned techniques have utility in their own right and in combination with one another. However, we see them best applied as part of a transfer learning operation. Five pre-trained surrogate models produced prior to this research were further trained with an augmented dataset and with our custom loss function. Through the combination of all three techniques, we see an improvement of at least $38\%$ in performance across the five models.