LGBMQMFeb 17, 2025

Classifying the Stoichiometry of Virus-like Particles with Interpretable Machine Learning

arXiv:2502.12049v13 citationsh-index: 23Has CodeEMBC
Originality Synthesis-oriented
AI Analysis

This work addresses the time-consuming and purification-intensive experimental methods for determining VLP stoichiometry, which is incremental in applying existing machine learning techniques to a new biological dataset.

The authors tackled the problem of classifying the stoichiometry of virus-like particles (VLPs) for vaccine optimization by curating a new dataset and using interpretable linear machine learning models, achieving classification while identifying key protein sequence features.

Virus-like particles (VLPs) are valuable for vaccine development due to their immune-triggering properties. Understanding their stoichiometry, the number of protein subunits to form a VLP, is critical for vaccine optimisation. However, current experimental methods to determine stoichiometry are time-consuming and require highly purified proteins. To efficiently classify stoichiometry classes in proteins, we curate a new dataset and propose an interpretable, data-driven pipeline leveraging linear machine learning models. We also explore the impact of feature encoding on model performance and interpretability, as well as methods to identify key protein sequence features influencing classification. The evaluation of our pipeline demonstrates that it can classify stoichiometry while revealing protein features that possibly influence VLP assembly. The data and code used in this work are publicly available at https://github.com/Shef-AIRE/StoicIML.

Code Implementations1 repo
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes