Data augmentation and feature selection for automatic model recommendation in computational physics
This work addresses data scarcity and dimensionality issues for researchers in computational physics, though it is incremental as it adapts existing techniques to a specific domain.
The paper tackled the problem of insufficient and high-dimensional training data for classification tasks in computational physics by introducing feature selection and data augmentation algorithms, achieving 90% accuracy on a nonlinear structural mechanics classification problem when combined with a stacking ensemble.
Classification algorithms have recently found applications in computational physics for the selection of numerical methods or models adapted to the environment and the state of the physical system. For such classification tasks, labeled training data come from numerical simulations and generally correspond to physical fields discretized on a mesh. Three challenging difficulties arise: the lack of training data, their high dimensionality, and the non-applicability of common data augmentation techniques to physics data. This article introduces two algorithms to address these issues, one for dimensionality reduction via feature selection, and one for data augmentation. These algorithms are combined with a wide variety of classifiers for their evaluation. When combined with a stacking ensemble made of six multilayer perceptrons and a ridge logistic regression, they enable reaching an accuracy of 90% on our classification problem for nonlinear structural mechanics.