Factoring Multidimensional Data to Create a Sophisticated Bayes Classifier
This work addresses the challenge of improving classification accuracy in machine learning by optimizing data factorization, though it appears incremental as it builds on existing Bayesian methods.
The paper tackles the problem of selecting the best factorization of a categorical dataset by deriving an explicit formula for calculating marginal likelihoods, which are used to order factorizations and construct a Bayes classifier that leverages independent variable sets.
In this paper we derive an explicit formula for calculating the marginal likelihood of a given factorization of a categorical dataset. Since the marginal likelihood is proportional to the posterior probability of the factorization, these likelihoods can be used to order all possible factorizations and select the "best" way to factor the overall distribution from which the dataset is drawn. The best factorization can then be used to construct a Bayes classifier which benefits from factoring out mutually independent sets of variables.