DBMay 24

Top-k Approximate Functional Dependency Discovery

arXiv:2605.2492512.1
AI Analysis

For data quality auditing, this work provides a threshold-free method to discover the most significant approximate functional dependencies, addressing output size control and cross-dataset threshold variability.

The paper introduces global top-k approximate functional dependency discovery, returning the k strongest dependencies under the μ+ measure. The TALE-Opt algorithm achieves pruning ratios up to 99.81% and speedups up to 78.81× over exhaustive search on 41 real-world datasets.

Approximate functional dependencies (AFDs) relax exact functional dependencies by tolerating a bounded degree of violation, making them suited for data quality auditing. Threshold-based discovery returns all dependencies above a user-specified cutoff, but output size is uncontrollable, the right threshold varies across datasets, and widely used measures are sensitive to LHS dimensionality. We study global top-$k$ AFD discovery, where neither the LHS nor the RHS is fixed and the $k$ strongest dependencies under $μ^+$ are returned directly. The cross-attribute comparability of $μ^+$ makes such a global ranking well-defined. We prove a Triangle Incompatibility Theorem showing that minimality, global top-$k$ ranking, and exact-$k$ output cannot simultaneously hold under any non-monotonic scoring function, justifying the removal of the minimality requirement. We present two algorithms: TALE-Base, which returns the exact global top-$k$ result by exhaustive level-wise evaluation, and TALE-Opt, which reduces computation through Apriori-style candidate generation, LHS computation reuse, and two complementary pruning rules exploiting exact FD monotonicity and an optimistic upper bound on $μ^+$. Experiments on 41 real-world datasets show that TALE-Opt achieves pruning ratios up to 99.81\% and speedups over TALE-Base up to 78.81$\times$.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes