LGMar 8, 2014

Improving Performance of a Group of Classification Algorithms Using Resampling and Feature Selection

arXiv:1403.1946v12 citations
Originality Incremental advance
AI Analysis

This addresses the challenge of extracting meaningful patterns from large datasets for data miners, though it appears incremental as it combines existing techniques in a new hybrid approach.

The authors tackled the problem of improving classification performance on high-dimensional data by proposing a hybrid feature selection method combining resampling, filtering, and wrapper evaluation with genetic search, applied to a Lung-Cancer dataset. The result was a substantial decrease in classification error and improved average performance across five algorithms, with the method outperforming other feature selection approaches at lower cost.

In recent years the importance of finding a meaningful pattern from huge datasets has become more challenging. Data miners try to adopt innovative methods to face this problem by applying feature selection methods. In this paper we propose a new hybrid method in which we use a combination of resampling, filtering the sample domain and wrapper subset evaluation method with genetic search to reduce dimensions of Lung-Cancer dataset that we received from UCI Repository of Machine Learning databases. Finally, we apply some well- known classification algorithms (Naïve Bayes, Logistic, Multilayer Perceptron, Best First Decision Tree and JRIP) to the resulting dataset and compare the results and prediction rates before and after the application of our feature selection method on that dataset. The results show a substantial progress in the average performance of five classification algorithms simultaneously and the classification error for these classifiers decreases considerably. The experiments also show that this method outperforms other feature selection methods with a lower cost.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes