LGDSOct 13, 2022

Delta-Closure Structure for Studying Data Distribution

arXiv:2210.06926v1h-index: 39
Originality Synthesis-oriented
AI Analysis

This work addresses pattern mining for data analysis, but it appears incremental as it builds on existing closure-based methods.

The paper tackles the problem of understanding data distributions in binary datasets by introducing a generalization of closure operators called Δ-closedness, which characterizes equivalence classes and partitions them into a Δ-closure structure to reveal attribute correlations. The experiments demonstrate that this structure is stable for large Δ and not heavily dependent on data sampling.

In this paper, we revisit pattern mining and study the distribution underlying a binary dataset thanks to the closure structure which is based on passkeys, i.e., minimum generators in equivalence classes robust to noise. We introduce $Δ$-closedness, a generalization of the closure operator, where $Δ$ measures how a closed set differs from its upper neighbors in the partial order induced by closure. A $Δ$-class of equivalence includes minimum and maximum elements and allows us to characterize the distribution underlying the data. Moreover, the set of $Δ$-classes of equivalence can be partitioned into the so-called $Δ$-closure structure. In particular, a $Δ$-class of equivalence with a high level demonstrates correlations among many attributes, which are supported by more observations when $Δ$ is large. In the experiments, we study the $Δ$-closure structure of several real-world datasets and show that this structure is very stable for large $Δ$ and does not substantially depend on the data sampling used for the analysis.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes