MLLGOct 25, 2022

Analysis of Estimating the Bayes Rule for Gaussian Mixture Models with a Specified Missing-Data Mechanism

arXiv:2210.13785v22 citationsh-index: 54
Originality Incremental advance
AI Analysis

This work addresses classification accuracy in semi-supervised settings for fields like neuroscience and medical imaging, but it is incremental as it builds on prior research with specific model assumptions.

The paper tackles the problem of semi-supervised learning with missing class labels by analyzing a generative model with a missing-data mechanism, showing that this classifier can outperform fully supervised and no-mechanism classifiers under specific conditions like moderate overlap or few missing labels.

Semi-supervised learning (SSL) approaches have been successfully applied in a wide range of engineering and scientific fields. This paper investigates the generative model framework with a missingness mechanism for unclassified observations, as introduced by Ahfock and McLachlan(2020). We show that in a partially classified sample, a classifier using Bayes rule of allocation with a missing-data mechanism can surpass a fully supervised classifier in a two-class normal homoscedastic model, especially with moderate to low overlap and proportion of missing class labels, or with large overlap but few missing labels. It also outperforms a classifier with no missing-data mechanism regardless of the overlap region or the proportion of missing class labels. Our exploration of two- and three-component normal mixture models with unequal covariances through simulations further corroborates our findings. Finally, we illustrate the use of the proposed classifier with a missing-data mechanism on interneuronal and skin lesion datasets.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes