CLDBMar 9, 2022

ASET: Ad-hoc Structured Exploration of Text Collections [Extended Abstract]

arXiv:2203.04663v11 citationsh-index: 8
Originality Incremental advance
AI Analysis

This addresses the need for ad-hoc structured exploration of text collections, offering a practical solution for users in data analysis domains, though it appears incremental as it builds on existing extractors.

The paper tackles the problem of extracting structured data from text collections without predefined pipelines by proposing ASET, a system that uses a two-phase approach with existing extractors and embedding-based matching, achieving high-quality results in evaluations.

In this paper, we propose a new system called ASET that allows users to perform structured explorations of text collections in an ad-hoc manner. The main idea of ASET is to use a new two-phase approach that first extracts a superset of information nuggets from the texts using existing extractors such as named entity recognizers and then matches the extractions to a structured table definition as requested by the user based on embeddings. In our evaluation, we show that ASET is thus able to extract structured data from real-world text collections in high quality without the need to design extraction pipelines upfront.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes