MLLGJan 31, 2024

Variable selection for Naïve Bayes classification

arXiv:2401.18039v188 citationsh-index: 31Comput Oper Res
Originality Incremental advance
AI Analysis

This work addresses performance and interpretability issues in Naïve Bayes classification for datasets with correlated or numerous features, but it is incremental as it builds on existing feature selection methods.

The authors tackled the problem of feature selection for Naïve Bayes classification by proposing a sparse version that accounts for feature correlations and allows flexible performance measures, resulting in competitive accuracy, sparsity, and running times for balanced datasets and better class compromises for unbalanced ones.

The Naïve Bayes has proven to be a tractable and efficient method for classification in multivariate analysis. However, features are usually correlated, a fact that violates the Naïve Bayes' assumption of conditional independence, and may deteriorate the method's performance. Moreover, datasets are often characterized by a large number of features, which may complicate the interpretation of the results as well as slow down the method's execution. In this paper we propose a sparse version of the Naïve Bayes classifier that is characterized by three properties. First, the sparsity is achieved taking into account the correlation structure of the covariates. Second, different performance measures can be used to guide the selection of features. Third, performance constraints on groups of higher interest can be included. Our proposal leads to a smart search, which yields competitive running times, whereas the flexibility in terms of performance measure for classification is integrated. Our findings show that, when compared against well-referenced feature selection approaches, the proposed sparse Naïve Bayes obtains competitive results regarding accuracy, sparsity and running times for balanced datasets. In the case of datasets with unbalanced (or with different importance) classes, a better compromise between classification rates for the different classes is achieved.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes