Linear algebra with transformers
This work addresses the challenge of enabling AI models to handle numerical tasks, though it is incremental as it applies existing transformer methods to a new domain of linear algebra.
The paper tackled the problem of teaching transformers to perform linear algebra computations, such as matrix operations and eigenvalue decomposition, by training them on random matrices, achieving over 90% accuracy and demonstrating robustness to noise and generalization to unseen matrix types.
Transformers can learn to perform numerical computations from examples only. I study nine problems of linear algebra, from basic matrix operations to eigenvalue decomposition and inversion, and introduce and discuss four encoding schemes to represent real numbers. On all problems, transformers trained on sets of random matrices achieve high accuracies (over 90%). The models are robust to noise, and can generalize out of their training distribution. In particular, models trained to predict Laplace-distributed eigenvalues generalize to different classes of matrices: Wigner matrices or matrices with positive eigenvalues. The reverse is not true.