DBDSIRAug 5, 2014

Non-hierarchical Structures: How to Model and Index Overlaps?

arXiv:1408.1011v32 citations
Originality Incremental advance
AI Analysis

This work addresses a data modeling challenge for researchers and practitioners dealing with complex digital documents, offering a solution for non-hierarchical structures, though it is incremental as it builds upon existing XML frameworks.

The paper tackles the problem of modeling and indexing overlapping structural components in digital objects, which cannot be handled by traditional tree-based methods, by introducing TGSA, a novel extension of the XML data model, and an associated indexing technique that efficiently processes non-hierarchical structures with formal proofs of validity.

Overlap is a common phenomenon seen when structural components of a digital object are neither disjoint nor nested inside each other. Overlapping components resist reduction to a structural hierarchy, and tree-based indexing and query processing techniques cannot be used for them. Our solution to this data modeling problem is TGSA (Tree-like Graph for Structural Annotations), a novel extension of the XML data model for non-hierarchical structures. We introduce an algorithm for constructing TGSA from annotated documents; the algorithm can efficiently process non-hierarchical structures and is associated with formal proofs, ensuring that transformation of the document to the data model is valid. To enable high performance query analysis in large data repositories, we further introduce an extension of XML pre-post indexing for non-hierarchical structures, which can process both reachability and overlapping relationships.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes