Multimodal Quantum Vision Transformer for Enzyme Commission Classification from Biochemical Representations
This work addresses enzyme classification for computational biology, offering a novel multimodal approach that is incremental in combining existing quantum and vision transformer methods.
The paper tackled the challenge of predicting enzyme functionality by developing a multimodal Quantum Machine Learning framework that integrates four biochemical modalities, achieving a top-1 accuracy of 85.1% and outperforming sequence-only baselines.
Accurately predicting enzyme functionality remains one of the major challenges in computational biology, particularly for enzymes with limited structural annotations or sequence homology. We present a novel multimodal Quantum Machine Learning (QML) framework that enhances Enzyme Commission (EC) classification by integrating four complementary biochemical modalities: protein sequence embeddings, quantum-derived electronic descriptors, molecular graph structures, and 2D molecular image representations. Quantum Vision Transformer (QVT) backbone equipped with modality-specific encoders and a unified cross-attention fusion module. By integrating graph features and spatial patterns, our method captures key stereoelectronic interactions behind enzyme function. Experimental results demonstrate that our multimodal QVT model achieves a top-1 accuracy of 85.1%, outperforming sequence-only baselines by a substantial margin and achieving better performance results compared to other QML models.