PepTriX: A Framework for Explainable Peptide Analysis through Protein Language Models
This work addresses the need for generalizable and interpretable peptide analysis in bioinformatics and drug discovery, offering a solution that bridges performance and domain understanding, though it is incremental in combining existing methods.
The paper tackles the problem of peptide classification for tasks like predicting toxicity and HIV inhibition by introducing PepTriX, a framework that integrates 1D sequence embeddings and 3D structural features using a graph attention network with contrastive training and cross-modal co-attention, resulting in strong performance across multiple tasks and providing interpretable insights into biological motifs.
Peptide classification tasks, such as predicting toxicity and HIV inhibition, are fundamental to bioinformatics and drug discovery. Traditional approaches rely heavily on handcrafted encodings of one-dimensional (1D) peptide sequences, which can limit generalizability across tasks and datasets. Recently, protein language models (PLMs), such as ESM-2 and ESMFold, have demonstrated strong predictive performance. However, they face two critical challenges. First, fine-tuning is computationally costly. Second, their complex latent representations hinder interpretability for domain experts. Additionally, many frameworks have been developed for specific types of peptide classification, lacking generalization. These limitations restrict the ability to connect model predictions to biologically relevant motifs and structural properties. To address these limitations, we present PepTriX, a novel framework that integrates one dimensional (1D) sequence embeddings and three-dimensional (3D) structural features via a graph attention network enhanced with contrastive training and cross-modal co-attention. PepTriX automatically adapts to diverse datasets, producing task-specific peptide vectors while retaining biological plausibility. After evaluation by domain experts, we found that PepTriX performs remarkably well across multiple peptide classification tasks and provides interpretable insights into the structural and biophysical motifs that drive predictions. Thus, PepTriX offers both predictive robustness and interpretable validation, bridging the gap between performance-driven peptide-level models (PLMs) and domain-level understanding in peptide research.