GN LGNov 13, 2023

To Transformers and Beyond: Large Language Models for the Genome

Micaela E. Consens, Cameron Dufault, Michael Wainberg, Duncan Forster, Mehran Karimzadeh, Hani Goodarzi, Fabian J. Theis, Alan Moses, Bo Wang

arXiv:2311.07621v111.750 citationsh-index: 14

Originality Synthesis-oriented

AI Analysis

It serves as an educational guide for computational biologists and computer scientists on a fundamental shift in genomic data analysis, but is incremental as a review paper.

This review examines the application of Large Language Models (LLMs), primarily based on transformers, to genomics, highlighting their strengths and limitations compared to traditional methods like CNNs and RNNs, and discusses future directions beyond transformers.

In the rapidly evolving landscape of genomics, deep learning has emerged as a useful tool for tackling complex computational challenges. This review focuses on the transformative role of Large Language Models (LLMs), which are mostly based on the transformer architecture, in genomics. Building on the foundation of traditional convolutional neural networks and recurrent neural networks, we explore both the strengths and limitations of transformers and other LLMs for genomics. Additionally, we contemplate the future of genomic modeling beyond the transformer architecture based on current trends in research. The paper aims to serve as a guide for computational biologists and computer scientists interested in LLMs for genomic data. We hope the paper can also serve as an educational introduction and discussion for biologists to a fundamental shift in how we will be analyzing genomic data in the future.

View on arXiv PDF

Similar