QUANT-PHLGMar 21, 2024

Learning with SASQuaTCh: a Novel Variational Quantum Transformer Architecture with Kernel-Based Self-Attention

arXiv:2403.14753v213 citationsh-index: 7
Originality Incremental advance
AI Analysis

This addresses the scalability issues in large-scale machine learning models like GPT for applications such as text and image prediction, but it is incremental as it adapts quantum methods to an existing transformer framework.

The authors tackled the exponential parameter growth and high computational costs of classical transformer models by introducing SASQuaTCh, a variational quantum transformer architecture that uses kernel-based self-attention, achieving exponential improvements in parameter and run-time complexity with only 9 qubits and high accuracy on image classification tasks.

The recent exploding growth in size of state-of-the-art machine learning models highlights a well-known issue where exponential parameter growth, which has grown to trillions as in the case of the Generative Pre-trained Transformer (GPT), leads to training time and memory requirements which limit their advancement in the near term. The predominant models use the so-called transformer network and have a large field of applicability, including predicting text and images, classification, and even predicting solutions to the dynamics of physical systems. Here we present a variational quantum circuit architecture named Self-Attention Sequential Quantum Transformer Channel (SASQuaTCh), which builds networks of qubits that perform analogous operations of the transformer network, namely the keystone self-attention operation, and leads to an exponential improvement in parameter complexity and run-time complexity over its classical counterpart. Our approach leverages recent insights from kernel-based operator learning in the context of predicting spatiotemporal systems to represent deep layers of a vision transformer network using simple gate operations and a set of multi-dimensional quantum Fourier transforms. To validate our approach, we consider image classification tasks in simulation and with hardware, where with only 9 qubits and a handful of parameters we are able to simultaneously embed and classify a grayscale image of handwritten digits with high accuracy.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes