LG CHEM-PHNov 17, 2023

How False Data Affects Machine Learning Models in Electrochemistry?

Krittapong Deshsorna, Luckhana Lawtrakul, Pawin Iamprasertkun

arXiv:2311.10795v212 citationsh-index: 19

Originality Synthesis-oriented

AI Analysis

This work provides practical guidance for electrochemistry researchers on selecting robust machine learning models for noisy data, though it's incremental as it applies established methods to a specific domain.

This study investigated how different machine learning models handle noisy electrochemical data, finding that linear models are noise-tolerant but less accurate (60.19 F g⁻¹ error at 0% noise), while tree-based models are accurate but noise-sensitive (55.24 F g⁻¹ error at 100% noise). The stacking model achieved both high accuracy (25.03 F g⁻¹ error) and good noise tolerance (43.58 F g⁻¹ error slope), making it the recommended approach.

Recently, the selection of machine learning model based on only the data distribution without concerning the noise of the data. This study aims to distinguish, which models perform well under noisy data, and establish whether stacking machine learning models actually provide robustness to otherwise weak-to-noise models. The electrochemical data were tested with 12 standalone models and stacking model. This includes XGB, LGBM, RF, GB, ADA, NN, ELAS, LASS, RIDGE, SVM, KNN, DT, and the stacking model. It is found that linear models handle noise well with the average error of (slope) to 1.75 F g-1 up to error per 100% percent noise added; but it suffers from prediction accuracy due to having an average of 60.19 F g-1 estimated at minimal error at 0% noise added. Tree-based models fail in terms of noise handling (average slope is 55.24 F g-1 at 100% percent noise), but it can provide higher prediction accuracy (lowest error of 23.9 F g-1) than that of linear. To address the controversial between prediction accuracy and error handling, the stacking model was constructed, which is not only show high accuracy (intercept of 25.03 F g-1), but it also exhibits good noise handling (slope of 43.58 F g-1), making stacking models a relatively low risk and viable choice for beginner and experienced machine learning research in electrochemistry. Even though neural networks (NN) are gaining popularity in the electrochemistry field. However, this study presents that NN is not suitable for electrochemical data, and improper tuning resulting in a model that is susceptible to noise. Thus, STACK models should provide better benefits in that even with untuned base models, they can achieve an accurate and noise-tolerant model. Overall, this work provides insight into machine learning model selection for electrochemical data, which should aid the understanding of data science in chemistry context.

View on arXiv PDF

Similar