QMLGOct 21, 2024

A Fusion-Driven Approach of Attention-Based CNN-BiLSTM for Protein Family Classification -- ProFamNet

arXiv:2410.17293v11 citationsh-index: 7
Originality Incremental advance
AI Analysis

This work addresses protein family classification for bioinformatics, offering a more efficient model with improved performance, though it is incremental as it builds on existing deep learning techniques.

The study tackled protein family classification by developing ProFamNet, a fusion model combining 1D-CNN, BiLSTM, and attention mechanisms, which achieved a higher F1 score of 98.30% compared to 97.67% with fewer parameters and training epochs.

Advanced automated AI techniques allow us to classify protein sequences and discern their biological families and functions. Conventional approaches for classifying these protein families often focus on extracting N-Gram features from the sequences while overlooking crucial motif information and the interplay between motifs and neighboring amino acids. Recently, convolutional neural networks have been applied to amino acid and motif data, even with a limited dataset of well-characterized proteins, resulting in improved performance. This study presents a model for classifying protein families using the fusion of 1D-CNN, BiLSTM, and an attention mechanism, which combines spatial feature extraction, long-term dependencies, and context-aware representations. The proposed model (ProFamNet) achieved superior model efficiency with 450,953 parameters and a compact size of 1.72 MB, outperforming the state-of-the-art model with 4,578,911 parameters and a size of 17.47 MB. Further, we achieved a higher F1 score (98.30% vs. 97.67%) with more instances (271,160 vs. 55,077) in fewer training epochs (25 vs. 30).

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes