Haldane Bundles: A Dataset for Learning to Predict the Chern Number of Line Bundles on the Torus
This work addresses the need for machine learning methods to predict topological invariants in materials science, but it is incremental as it focuses on a synthetic dataset rather than real-world applications.
The authors tackled the problem of predicting characteristic classes like the Chern number for materials, which is computationally expensive with first-principles methods, by introducing the Haldane bundle dataset of synthetically generated complex line bundles on the 2-torus. They show this dataset is difficult for off-the-shelf architectures, serving as a testing ground for machine learning approaches that incorporate topological and geometric priors.
Characteristic classes, which are abstract topological invariants associated with vector bundles, have become an important notion in modern physics with surprising real-world consequences. As a representative example, the incredible properties of topological insulators, which are insulators in their bulk but conductors on their surface, can be completely characterized by a specific characteristic class associated with their electronic band structure, the first Chern class. Given their importance to next generation computing and the computational challenge of calculating them using first-principles approaches, there is a need to develop machine learning approaches to predict the characteristic classes associated with a material system. To aid in this program we introduce the {\emph{Haldane bundle dataset}}, which consists of synthetically generated complex line bundles on the $2$-torus. We envision this dataset, which is not as challenging as noisy and sparsely measured real-world datasets but (as we show) still difficult for off-the-shelf architectures, to be a testing ground for architectures that incorporate the rich topological and geometric priors underlying characteristic classes.