DBMar 7

Novel Table Search [Technical Report]

arXiv:2603.07235v1
Predicted impact top 85% in DB · last 90 daysOriginality Incremental advance
AI Analysis

This work addresses the problem of finding novel tables in data lakes, which is a significant challenge for data scientists and analysts dealing with large and redundant datasets.

This paper addresses the problem of discovering unionable tables that contribute new information for a given query table in large-scale data lakes, formally defining it as Novel Table Search (NTS). They propose an efficient approximation technique called Attribute-Based Novel Table Search (ANTs) which outperforms other methods in capturing syntactic novelty across various benchmarks and achieves the lowest execution time.

Avoiding redundancy in query results has been extensively studied in relational databases and information retrieval, yet its implications for data lakes remain largely unexplored. We bridge this gap by investigating how to discover unionable tables that contribute new information for a given query table in large-scale data lakes. We formally define Novel Table Search (NTS) as the problem of finding tables that are novel with respect to a given query table and identify two desirable properties that any scoring function for NTS should satisfy. We introduce a concrete scoring mechanism designed to maximize syntactic novelty, prove that it satisfies the proposed properties, and show that the associated optimization problem is NP-hard. To address this challenge, we develop an efficient approximation technique based on penalization, i.e., Attribute-Based Novel Table Search (ANTs). We propose three additional NTS variants to achieve syntactic novelty and introduce two evaluation metrics for syntactic novelty. Through extensive experiments, we demonstrate that ANTs outperforms other methods in capturing syntactic novelty across evaluation metrics and various benchmarks, while also achieving the lowest execution time.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes