AIJul 16, 2017

Improving Naive Bayes for Regression with Optimised Artificial Surrogate Data

arXiv:1707.04943v310 citations
Originality Incremental advance
AI Analysis

This method enhances accuracy for simple, interpretable naive Bayes models, which often lag behind complex black-box models, offering a novel twist on the training paradigm.

The paper tackles the problem of naive Bayes regression underperformance by generating artificial surrogate training data via population-based optimization, resulting in improved generalization performance compared to training on real data.

Can we evolve better training data for machine learning algorithms? To investigate this question we use population-based optimisation algorithms to generate artificial surrogate training data for naive Bayes for regression. We demonstrate that the generalisation performance of naive Bayes for regression models is enhanced by training them on the artificial data as opposed to the real data. These results are important for two reasons. Firstly, naive Bayes models are simple and interpretable but frequently underperform compared to more complex "black box" models, and therefore new methods of enhancing accuracy are called for. Secondly, the idea of using the real training data indirectly in the construction of the artificial training data, as opposed to directly for model training, is a novel twist on the usual machine learning paradigm.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes