BIO-PHBMQMMLDec 20, 2017

Unsupervised learning of dynamical and molecular similarity using variance minimization

arXiv:1712.07704v14 citations
Originality Synthesis-oriented
AI Analysis

This work addresses the need for better similarity assessment in molecular systems to avoid overfitting in supervised learning, though it is incremental as it applies an existing clustering method to new contexts.

The authors tackled the problem of grouping molecular systems by dynamics or structural similarity using an unsupervised method based on Ward's minimum variance clustering, achieving insights into protein dynamics from tripeptide simulations and enabling improved data splits for supervised learning in chemoinformatics.

In this report, we present an unsupervised machine learning method for determining groups of molecular systems according to similarity in their dynamics or structures using Ward's minimum variance objective function. We first apply the minimum variance clustering to a set of simulated tripeptides using the information theoretic Jensen-Shannon divergence between Markovian transition matrices in order to gain insight into how point mutations affect protein dynamics. Then, we extend the method to partition two chemoinformatic datasets according to structural similarity to motivate a train/validation/test split for supervised learning that avoids overfitting.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes