Prediction of Diffusion Coefficients in Mixtures with Tensor Completion
This work provides a more accurate and temperature-dependent prediction method for diffusion coefficients, which is crucial for engineers and scientists working with chemical processes where experimental data is scarce.
This paper introduces a hybrid tensor completion method (TCM) to predict temperature-dependent diffusion coefficients in binary mixtures, addressing the limitation of existing matrix completion methods to single-temperature predictions. The TCM, trained on experimental data and semi-empirical model predictions, accurately extrapolates diffusion coefficients between 268 K and 378 K, showing improved accuracy over established models. Further, incorporating new experimental data acquired via active learning significantly enhances the TCM's predictive performance.
Predicting diffusion coefficients in mixtures is crucial for many applications, as experimental data remain scarce, and machine learning (ML) offers promising alternatives to established semi-empirical models. Among ML models, matrix completion methods (MCMs) have proven effective in predicting thermophysical properties, including diffusion coefficients in binary mixtures. However, MCMs are restricted to single-temperature predictions, and their accuracy depends strongly on the availability of high-quality experimental data for each temperature of interest. In this work, we address this challenge by presenting a hybrid tensor completion method (TCM) for predicting temperature-dependent diffusion coefficients at infinite dilution in binary mixtures. The TCM employs a Tucker decomposition and is jointly trained on experimental data for diffusion coefficients at infinite dilution in binary systems at 298 K, 313 K, and 333 K. Predictions from the semi-empirical SEGWE model serve as prior knowledge within a Bayesian training framework. The TCM then extrapolates linearly to any temperature between 268 K and 378 K, achieving markedly improved prediction accuracy compared to established models across all studied temperatures. To further enhance predictive performance, the experimental database was expanded using active learning (AL) strategies for targeted acquisition of new diffusion data by pulsed-field gradient (PFG) NMR measurements. Diffusion coefficients at infinite dilution in 19 solute + solvent systems were measured at 298 K, 313 K, and 333 K. Incorporating these results yields a substantial improvement in the TCM's predictive accuracy. These findings highlight the potential of combining data-efficient ML methods with adaptive experimentation to advance predictive modeling of transport properties.