DBLGMar 19, 2025

ACE: A Cardinality Estimator for Set-Valued Queries

arXiv:2503.14929v1h-index: 5Proc VLDB Endow
Originality Incremental advance
AI Analysis

This work addresses a gap in database systems for applications like information retrieval and recommender systems, though it is incremental as it builds on existing cardinality estimation techniques.

The paper tackles the problem of cardinality estimation for set-valued data in databases, which is not well-addressed by existing methods, and proposes ACE, an attention-based estimator that outperforms state-of-the-art competitors in accuracy and efficiency on three datasets.

Cardinality estimation is a fundamental functionality in database systems. Most existing cardinality estimators focus on handling predicates over numeric or categorical data. They have largely omitted an important data type, set-valued data, which frequently occur in contemporary applications such as information retrieval and recommender systems. The few existing estimators for such data either favor high-frequency elements or rely on a partial independence assumption, which limits their practical applicability. We propose ACE, an Attention-based Cardinality Estimator for estimating the cardinality of queries over set-valued data. We first design a distillation-based data encoder to condense the dataset into a compact matrix. We then design an attention-based query analyzer to capture correlations among query elements. To handle variable-sized queries, a pooling module is introduced, followed by a regression model (MLP) to generate final cardinality estimates. We evaluate ACE on three datasets with varying query element distributions, demonstrating that ACE outperforms the state-of-the-art competitors in terms of both accuracy and efficiency.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes