SOC-PHLGApr 27, 2025

Metric Similarity and Manifold Learning of Circular Dichroism Spectra of Proteins

arXiv:2504.19355v2h-index: 4
Originality Synthesis-oriented
AI Analysis

This work addresses protein structure analysis for bioinformatics researchers, but it is incremental as it applies existing methods to a specific dataset.

The study tackled the analysis of circular dichroism spectra of globular proteins using machine learning, finding that the optimal transport-based Wasserstein distance is robust to noise and consistent with other metrics, while t-SNE clustering reveals distinct protein groups based on secondary structure compositions.

We present a machine learning analysis of circular dichroism spectra of globular proteins from the SP175 database, using the optimal transport-based $1$-Wasserstein distance $\mathcal{W}_1$ (with order $p=1$) and the manifold learning algorithm $t$-SNE. Our results demonstrate that $\mathcal{W}_1$ is consistent with both Euclidean and Manhattan metrics while exhibiting robustness to noise. On the other hand, $t$-SNE uncovers meaningful structure in the high-dimensional data. The clustering in the $t$-SNE embedding is primarily determined by proteins with distinct secondary structure compositions: one cluster predominantly contains $β$-rich proteins, while the other consists mainly of proteins with mixed $α/β$ and $α$-helical content.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes