LGAIMLFeb 20, 2013

Estimating Continuous Distributions in Bayesian Classifiers

arXiv:1302.4964v13784 citations
Originality Incremental advance
AI Analysis

This addresses the challenge of modeling continuous distributions in Bayesian networks for machine learning practitioners, offering a more flexible alternative to discretization or Gaussian assumptions.

The paper tackles the problem of handling continuous variables in Bayesian classifiers by abandoning the normality assumption and using nonparametric kernel density estimation, resulting in large error reductions on various natural and artificial data sets.

When modeling a probability distribution with a Bayesian network, we are faced with the problem of how to handle continuous variables. Most previous work has either solved the problem by discretizing, or assumed that the data are generated by a single Gaussian. In this paper we abandon the normality assumption and instead use statistical methods for nonparametric density estimation. For a naive Bayesian classifier, we present experimental results on a variety of natural and artificial domains, comparing two methods of density estimation: assuming normality and modeling each conditional distribution with a single Gaussian; and using nonparametric kernel density estimation. We observe large reductions in error on several natural and artificial data sets, which suggests that kernel estimation is a useful tool for learning Bayesian models.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes