LGAIJan 13, 2025

PROTECT: Protein circadian time prediction using unsupervised learning

arXiv:2501.07405v11 citationsh-index: 2iScience
Originality Incremental advance
AI Analysis

This addresses a gap in circadian rhythm prediction for proteomics, offering a tool for researchers studying diseases like Alzheimer's, though it is incremental as it adapts existing unsupervised techniques to a new data type.

The paper tackled the problem of predicting circadian phases from proteomic data, which lacks time labels and prior knowledge, by developing an unsupervised deep learning method; it achieved high accuracy on labeled data and identified circadian disruptions in Alzheimer's disease subjects.

Circadian rhythms regulate the physiology and behavior of humans and animals. Despite advancements in understanding these rhythms and predicting circadian phases at the transcriptional level, predicting circadian phases from proteomic data remains elusive. This challenge is largely due to the scarcity of time labels in proteomic datasets, which are often characterized by small sample sizes, high dimensionality, and significant noise. Furthermore, existing methods for predicting circadian phases from transcriptomic data typically rely on prior knowledge of known rhythmic genes, making them unsuitable for proteomic datasets. To address this gap, we developed a novel computational method using unsupervised deep learning techniques to predict circadian sample phases from proteomic data without requiring time labels or prior knowledge of proteins or genes. Our model involves a two-stage training process optimized for robust circadian phase prediction: an initial greedy one-layer-at-a-time pre-training which generates informative initial parameters followed by fine-tuning. During fine-tuning, a specialized loss function guides the model to align protein expression levels with circadian patterns, enabling it to accurately capture the underlying rhythmic structure within the data. We tested our method on both time-labeled and unlabeled proteomic data. For labeled data, we compared our predictions to the known time labels, achieving high accuracy, while for unlabeled human datasets, including postmortem brain regions and urine samples, we explored circadian disruptions. Notably, our analysis identified disruptions in rhythmic proteins between Alzheimer's disease and control subjects across these samples.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes