Feature selection with test cost constraint
This addresses the practical issue of limited resources in feature acquisition for machine learning applications, offering an incremental improvement by redefining existing rough set problems from a constraint satisfaction perspective.
The paper tackles the problem of selecting informative yet affordable features under resource constraints by formulating feature selection with test cost as a constraint satisfaction problem, proposing a backtracking algorithm for medium-sized data and a heuristic for large datasets that finds optimal solutions in most cases.
Feature selection is an important preprocessing step in machine learning and data mining. In real-world applications, costs, including money, time and other resources, are required to acquire the features. In some cases, there is a test cost constraint due to limited resources. We shall deliberately select an informative and cheap feature subset for classification. This paper proposes the feature selection with test cost constraint problem for this issue. The new problem has a simple form while described as a constraint satisfaction problem (CSP). Backtracking is a general algorithm for CSP, and it is efficient in solving the new problem on medium-sized data. As the backtracking algorithm is not scalable to large datasets, a heuristic algorithm is also developed. Experimental results show that the heuristic algorithm can find the optimal solution in most cases. We also redefine some existing feature selection problems in rough sets, especially in decision-theoretic rough sets, from the viewpoint of CSP. These new definitions provide insight to some new research directions.