Unsupervised Machine-Learning Pipeline for Data-Driven Defect Detection and Characterisation: Application to Displacement Cascades
This provides an efficient, quantitative tool for materials scientists to analyze irradiation damage in displacement cascades without manual tuning, though it is incremental as it combines existing ML techniques for a specific domain.
The authors tackled the problem of detecting and classifying atomic-scale defects in materials after neutron irradiation by developing an unsupervised machine learning pipeline that uses autoencoders and clustering on SOAP descriptors. The method successfully identified 99.7% of outlier atoms as defects and achieved high correlation (R2 > 0.89) between cluster sizes and defect counts in materials like Ni, Fe70Ni10Cr20, and Zr.
Neutron irradiation produces, within a few picoseconds, displacement cascades that are sequences of atomic collisions generating point and extended defects which subsequently affects the long-term evolution of materials. The diversity of these defects, characterized morphologically and statistically, defines what is called the "primary damage". In this work, we present a fully unsupervised machine learning (ML) workflow that detects and classifies these defects directly from molecular dynamics data. Local environments are encoded by the Smooth Overlap of Atomic Positions (SOAP) vector, anomalous atoms are isolated with autoencoder neural networks (AE), embedded with Uniform Manifold Approximation and Projection (UMAP) and clustered using Hierarchical Density-Based Spatial Clustering of Applications with Noise (HDBSCAN). Applied to 80 keV displacement cascades in Ni, Fe$_7$0Ni$_{10}$Cr$_{20}$, and Zr, the AE successfully identify the small fraction of outlier atoms that participate in defect formation. HDBSCAN then partitions the UMAP latent space of AE-flagged SOAP descriptors into well defined groups representing vacancy- and interstitial-dominated regions and, within each, separates small from large aggregates, assigning 99.7 % of outliers to compact physical motifs. A signed cluster-identification score confirms this separation, and cluster size scales with net defect counts (R2 > 0.89). Statistical cross analyses between the ML outlier map and several conventional detectors (centrosymmetry, dislocation extraction, etc.) reveal strong overlap and complementary coverage, all achieved without template or threshold tuning. This ML workflow thus provides an efficient tool for the quantitative mapping of structural anomalies in materials, particularly those arising from irradiation damage in displacement cascades.