LG BM QMFeb 17, 2025

Classifying the Stoichiometry of Virus-like Particles with Interpretable Machine Learning

Jiayang Zhang, Xianyuan Liu, Wei Wu, Sina Tabakhi, Wenrui Fan, Shuo Zhou, Kang Lan Tee, Tuck Seng Wong, Haiping Lu

arXiv:2502.12049v17.13 citationsh-index: 23Has CodeEMBC

Originality Synthesis-oriented

AI Analysis

This work addresses the time-consuming and purification-intensive experimental methods for determining VLP stoichiometry, which is incremental in applying existing machine learning techniques to a new biological dataset.

The authors tackled the problem of classifying the stoichiometry of virus-like particles (VLPs) for vaccine optimization by curating a new dataset and using interpretable linear machine learning models, achieving classification while identifying key protein sequence features.

Virus-like particles (VLPs) are valuable for vaccine development due to their immune-triggering properties. Understanding their stoichiometry, the number of protein subunits to form a VLP, is critical for vaccine optimisation. However, current experimental methods to determine stoichiometry are time-consuming and require highly purified proteins. To efficiently classify stoichiometry classes in proteins, we curate a new dataset and propose an interpretable, data-driven pipeline leveraging linear machine learning models. We also explore the impact of feature encoding on model performance and interpretability, as well as methods to identify key protein sequence features influencing classification. The evaluation of our pipeline demonstrates that it can classify stoichiometry while revealing protein features that possibly influence VLP assembly. The data and code used in this work are publicly available at https://github.com/Shef-AIRE/StoicIML.

View on arXiv PDF Code

Similar