APLGFeb 15, 2022

Phenotyping with Positive Unlabelled Learning for Genome-Wide Association Studies

arXiv:2202.07451v1
Originality Incremental advance
AI Analysis

This work addresses noise in phenotypic classification for genome-wide association studies, which is important for improving disease biology understanding in healthcare and life sciences, but it appears incremental as it builds on existing methods like anchor learning and transformers.

The paper tackled the problem of phenotypic misclassification in electronic health records, which reduces the ability to detect associations in genome-wide association studies (GWAS), and showed that their proposed model, AnchorBERT, could detect genomic associations with 5× fewer cases and maintain 40% more significant associations when controls were reduced by 50%.

Identifying phenotypes plays an important role in furthering our understanding of disease biology through practical applications within healthcare and the life sciences. The challenge of dealing with the complexities and noise within electronic health records (EHRs) has motivated applications of machine learning in phenotypic discovery. While recent research has focused on finding predictive subtypes for clinical decision support, here we instead focus on the noise that results in phenotypic misclassification, which can reduce a phenotypes ability to detect associations in genome-wide association studies (GWAS). We show that by combining anchor learning and transformer architectures into our proposed model, AnchorBERT, we are able to detect genomic associations only previously found in large consortium studies with 5$\times$ more cases. When reducing the number of controls available by 50\%, we find our model is able to maintain 40\% more significant genomic associations from the GWAS catalog compared to standard phenotype definitions. \keywords{Phenotyping \and Machine Learning \and Semi-Supervised \and Genetic Association Studies \and Biological Discovery}

Code Implementations1 repo
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes