MDFS - MultiDimensional Feature Selection
This work addresses the issue of variable selection for data analysts by providing a tool that improves accuracy in identifying relevant variables, though it is incremental as it builds on existing information theory methods.
The authors tackled the problem of identifying informative variables in datasets by accounting for synergistic interactions, which one-dimensional filtering methods often miss, and demonstrated that their multidimensional approach yields more sensitive and reliable variable importance rankings on the Madelon dataset.
Identification of informative variables in an information system is often performed using simple one-dimensional filtering procedures that discard information about interactions between variables. Such approach may result in removing some relevant variables from consideration. Here we present an R package MDFS (MultiDimensional Feature Selection) that performs identification of informative variables taking into account synergistic interactions between multiple descriptors and the decision variable. MDFS is an implementation of an algorithm based on information theory. Computational kernel of the package is implemented in C++. A high-performance version implemented in CUDA C is also available. The applications of MDFS are demonstrated using the well-known Madelon dataset that has synergistic variables by design. The dataset comes from the UCI Machine Learning Repository. It is shown that multidimensional analysis is more sensitive than one-dimensional tests and returns more reliable rankings of importance.