MLLGDec 3, 2021

Bayes in Wonderland! Predictive supervised classification inference hits unpredictability

arXiv:2112.01880v12 citationsHas Code
Originality Incremental advance
AI Analysis

This work addresses a theoretical and practical issue in Bayesian classification for statisticians and machine learning researchers, offering tools for parameter estimation and hypothesis testing, though it is incremental in extending existing exchangeability concepts.

The paper tackles the problem of predictive supervised classification under partition exchangeability, where traditional marginal Bayesian classifiers may fail even with large training data. It provides a computational scheme to generate such sequences and demonstrates convergence between marginal and simultaneous classifiers, enabling the use of simpler, more efficient marginal classifiers.

The marginal Bayesian predictive classifiers (mBpc) as opposed to the simultaneous Bayesian predictive classifiers (sBpc), handle each data separately and hence tacitly assumes the independence of the observations. However, due to saturation in learning of generative model parameters, the adverse effect of this false assumption on the accuracy of mBpc tends to wear out in face of increasing amount of training data; guaranteeing the convergence of these two classifiers under de Finetti type of exchangeability. This result however, is far from trivial for the sequences generated under Partition exchangeability (PE), where even umpteen amount of training data is not ruling out the possibility of an unobserved outcome (Wonderland!). We provide a computational scheme that allows the generation of the sequences under PE. Based on that, with controlled increase of the training data, we show the convergence of the sBpc and mBpc. This underlies the use of simpler yet computationally more efficient marginal classifiers instead of simultaneous. We also provide a parameter estimation of the generative model giving rise to the partition exchangeable sequence as well as a testing paradigm for the equality of this parameter across different samples. The package for Bayesian predictive supervised classifications, parameter estimation and hypothesis testing of the Ewens Sampling formula generative model is deposited on CRAN as PEkit package and free available from https://github.com/AmiryousefiLab/PEkit.

Code Implementations1 repo
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes