DBAILGApr 1, 2022

Separate and conquer heuristic allows robust mining of contrast sets in classification, regression, and survival data

arXiv:2204.00497v34 citationsh-index: 20Has Code
Originality Synthesis-oriented
AI Analysis

This provides a tool for discovering group differences in areas like medicine and economics, but it is incremental as it builds on existing heuristics with extensions to new data types.

The paper tackles the problem of identifying differences between groups (contrast set mining) by presenting RuleKit-CS, an algorithm based on the separate and conquer heuristic, which was generalized for regression and survival data and tested on over 130 datasets, confirming its usefulness.

Identifying differences between groups is one of the most important knowledge discovery problems. The procedure, also known as contrast sets mining, is applied in a wide range of areas like medicine, industry, or economics. In the paper we present RuleKit-CS, an algorithm for contrast set mining based on separate and conquer - a well established heuristic for decision rule induction. Multiple passes accompanied with an attribute penalization scheme provide contrast sets describing same examples with different attributes, distinguishing presented approach from the standard separate and conquer. The algorithm was also generalized for regression and survival data allowing identification of contrast sets whose label attribute/survival prognosis is consistent with the label/prognosis for the predefined contrast groups. This feature, not provided by the existing approaches, further extends the usability of RuleKit-CS. Experiments on over 130 data sets from various areas and detailed analysis of selected cases confirmed RuleKit-CS to be a useful tool for discovering differences between defined groups. The algorithm was implemented as a part of the RuleKit suite available at GitHub under GNU AGPL 3 licence (https://github.com/adaa-polsl/RuleKit). Keywords: contrast sets, separate and conquer, regression, survival

Code Implementations1 repo
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes