Pre-Selection of Independent Binary Features: An Application to Diagnosing Scrapie in Sheep
This work addresses feature selection in diagnostic settings like veterinary medicine, but it is incremental as it applies existing methods to a specific domain without introducing new techniques.
The paper tackles the problem of selecting a subset of binary features for multi-class classification when only expert estimates of conditional probabilities are available, assuming conditional independence and using Naive Bayes, and demonstrates the application with Scrapie diagnosis in sheep, showing that Sequential Forward Selection can be used for feature selection with sensitivity analysis to assess robustness.
Suppose that the only available information in a multi-class problem are expert estimates of the conditional probabilities of occurrence for a set of binary features. The aim is to select a subset of features to be measured in subsequent data collection experiments. In the lack of any information about the dependencies between the features, we assume that all features are conditionally independent and hence choose the Naive Bayes classifier as the optimal classifier for the problem. Even in this (seemingly trivial) case of complete knowledge of the distributions, choosing an optimal feature subset is not straightforward. We discuss the properties and implementation details of Sequential Forward Selection (SFS) as a feature selection procedure for the current problem. A sensitivity analysis was carried out to investigate whether the same features are selected when the probabilities vary around the estimated values. The procedure is illustrated with a set of probability estimates for Scrapie in sheep.