Contingency Training
This addresses the issue of residual irrelevant variables in high-dimensional data for machine learning practitioners, offering an incremental improvement in classifier robustness.
The paper tackles the problem of classifiers being affected by irrelevant variables even after feature selection, and introduces Contingency Training, a classifier-independent method that improves accuracy and robustness by subsampling and removing information to assign feature importance weights. Experiments show it outperforms unmodified training on datasets with irrelevant variables and slightly on those without.
When applied to high-dimensional datasets, feature selection algorithms might still leave dozens of irrelevant variables in the dataset. Therefore, even after feature selection has been applied, classifiers must be prepared to the presence of irrelevant variables. This paper investigates a new training method called Contingency Training which increases the accuracy as well as the robustness against irrelevant attributes. Contingency training is classifier independent. By subsampling and removing information from each sample, it creates a set of constraints. These constraints aid the method to automatically find proper importance weights of the dataset's features. Experiments are conducted with the contingency training applied to neural networks over traditional datasets as well as datasets with additional irrelevant variables. For all of the tests, contingency training surpassed the unmodified training on datasets with irrelevant variables and even outperformed slightly when only a few or no irrelevant variables were present.