Using Constraints to Discover Sparse and Alternative Subgroup Descriptions
This work addresses the need for more interpretable and flexible subgroup descriptions in data analysis, though it is incremental as it builds on existing subgroup-discovery methods.
The paper tackles the problem of enhancing interpretability in subgroup discovery by introducing constraints for sparsity and alternative descriptions, and proposes a novel SMT formulation for solver-based search, demonstrating that heuristic methods often achieve high-quality subgroups quickly across 27 datasets.
Subgroup-discovery methods allow users to obtain simple descriptions of interesting regions in a dataset. Using constraints in subgroup discovery can enhance interpretability even further. In this article, we focus on two types of constraints: First, we limit the number of features used in subgroup descriptions, making the latter sparse. Second, we propose the novel optimization problem of finding alternative subgroup descriptions, which cover a similar set of data objects as a given subgroup but use different features. We describe how to integrate both constraint types into heuristic subgroup-discovery methods. Further, we propose a novel Satisfiability Modulo Theories (SMT) formulation of subgroup discovery as a white-box optimization problem, which allows solver-based search for subgroups and is open to a variety of constraint types. Additionally, we prove that both constraint types lead to an NP-hard optimization problem. Finally, we employ 27 binary-classification datasets to compare algorithmic and solver-based search for unconstrained and constrained subgroup discovery. We observe that heuristic search methods often yield high-quality subgroups within a short runtime, also in scenarios with constraints.