Theory of Optimal Bayesian Feature Filtering
This work provides foundational theoretical guarantees for a feature selection method in bioinformatics, but it is incremental as it builds on existing Bayesian frameworks.
The paper tackles the theoretical justification of optimal Bayesian feature filtering (OBF) for biomarker discovery, proving that OBF is optimal only under feature independence assumptions and showing its consistency under mild conditions, including non-Gaussian and correlated data.
Optimal Bayesian feature filtering (OBF) is a supervised screening method designed for biomarker discovery. In this article, we prove two major theoretical properties of OBF. First, optimal Bayesian feature selection under a general family of Bayesian models reduces to filtering if and only if the underlying Bayesian model assumes all features are mutually independent. Therefore, OBF is optimal if and only if one assumes all features are mutually independent, and OBF is the only filter method that is optimal under at least one model in the general Bayesian framework. Second, OBF under independent Gaussian models is consistent under very mild conditions, including cases where the data is non-Gaussian with correlated features. This result provides conditions where OBF is guaranteed to identify the correct feature set given enough data, and it justifies the use of OBF in non-design settings where its assumptions are invalid.