Can Large Language Models Predict Antimicrobial Resistance Gene?
This work addresses the need for more flexible DNA sequence analysis tools in bioinformatics, though it is incremental as it applies existing generative models to a new domain.
The study tackled the problem of predicting antimicrobial resistance genes from DNA sequences using generative large language models instead of traditional encoder-based models, finding they offer comparable or potentially better predictions with flexibility when incorporating both sequence and textual information.
This study demonstrates that generative large language models can be utilized in a more flexible manner for DNA sequence analysis and classification tasks compared to traditional transformer encoder-based models. While recent encoder-based models such as DNABERT and Nucleotide Transformer have shown significant performance in DNA sequence classification, transformer decoder-based generative models have not yet been extensively explored in this field. This study evaluates how effectively generative Large Language Models handle DNA sequences with various labels and analyzes performance changes when additional textual information is provided. Experiments were conducted on antimicrobial resistance genes, and the results show that generative Large Language Models can offer comparable or potentially better predictions, demonstrating flexibility and accuracy when incorporating both sequence and textual information. The code and data used in this work are available at the following GitHub repository: https://github.com/biocomgit/llm4dna.