Delta-Closure Structure for Studying Data Distribution
This work addresses pattern mining for data analysis, but it appears incremental as it builds on existing closure-based methods.
The paper tackles the problem of understanding data distributions in binary datasets by introducing a generalization of closure operators called Δ-closedness, which characterizes equivalence classes and partitions them into a Δ-closure structure to reveal attribute correlations. The experiments demonstrate that this structure is stable for large Δ and not heavily dependent on data sampling.
In this paper, we revisit pattern mining and study the distribution underlying a binary dataset thanks to the closure structure which is based on passkeys, i.e., minimum generators in equivalence classes robust to noise. We introduce $Δ$-closedness, a generalization of the closure operator, where $Δ$ measures how a closed set differs from its upper neighbors in the partial order induced by closure. A $Δ$-class of equivalence includes minimum and maximum elements and allows us to characterize the distribution underlying the data. Moreover, the set of $Δ$-classes of equivalence can be partitioned into the so-called $Δ$-closure structure. In particular, a $Δ$-class of equivalence with a high level demonstrates correlations among many attributes, which are supported by more observations when $Δ$ is large. In the experiments, we study the $Δ$-closure structure of several real-world datasets and show that this structure is very stable for large $Δ$ and does not substantially depend on the data sampling used for the analysis.