LGAIQMJul 18, 2025

A Comprehensive Review of Transformer-based language models for Protein Sequence Analysis and Design

arXiv:2507.13646v11 citationsh-index: 7IEEE Trans Artif Intell
Originality Synthesis-oriented
AI Analysis

This is an incremental review paper for researchers in bioinformatics, summarizing the state of the art in applying Transformer models to protein sequence tasks.

This paper reviews recent advances in Transformer-based language models for protein sequence analysis and design, covering applications like gene ontology, protein identification, and de novo protein generation, while analyzing strengths and weaknesses of existing works.

The impact of Transformer-based language models has been unprecedented in Natural Language Processing (NLP). The success of such models has also led to their adoption in other fields including bioinformatics. Taking this into account, this paper discusses recent advances in Transformer-based models for protein sequence analysis and design. In this review, we have discussed and analysed a significant number of works pertaining to such applications. These applications encompass gene ontology, functional and structural protein identification, generation of de novo proteins and binding of proteins. We attempt to shed light on the strength and weaknesses of the discussed works to provide a comprehensive insight to readers. Finally, we highlight shortcomings in existing research and explore potential avenues for future developments. We believe that this review will help researchers working in this field to have an overall idea of the state of the art in this field, and to orient their future studies.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes