LG MLOct 19, 2012

A Distance-Based Branch and Bound Feature Selection Algorithm

arXiv:1212.2488v17 citations

Originality Incremental advance

AI Analysis

This addresses the feature selection bottleneck in machine learning for domains like bioinformatics, but it is incremental as it builds on existing Branch and Bound techniques.

The paper tackles the problem of selecting k Gaussian features to minimize Bayesian classification error, which lacks efficient optimal methods, and presents a Branch and Bound algorithm that uses monotonic distance measures to find optimal subsets, tested on synthetic and gene expression data.

There is no known efficient method for selecting k Gaussian features from n which achieve the lowest Bayesian classification error. We show an example of how greedy algorithms faced with this task are led to give results that are not optimal. This motivates us to propose a more robust approach. We present a Branch and Bound algorithm for finding a subset of k independent Gaussian features which minimizes the naive Bayesian classification error. Our algorithm uses additive monotonic distance measures to produce bounds for the Bayesian classification error in order to exclude many feature subsets from evaluation, while still returning an optimal solution. We test our method on synthetic data as well as data obtained from gene expression profiling.

View on arXiv PDF

Similar