Make Any Collection Navigable: Methods for Constructing and Evaluating Hypergraph of Text
For researchers working on document navigation and browsing, this work provides a formal framework and evaluation metric, but the finding that simple methods match LLMs suggests incremental progress.
The paper proposes methods for constructing a Hypergraph of Text (HoT) to make any document collection navigable, and introduces a new metric called effort ratio to evaluate HoT quality. Experiments show that simple TF-IDF baselines can match LLM-based methods on this metric.
One reason the Web is more useful than a simple collection of documents is that the structure created by hyperlinks enables flexible navigation from one web page to another. However, hyperlinks are typically created manually and cannot fully capture a corpus' implicit semantic structures. Is there a general way to make an arbitrary collection navigable? Recent work has formalized this problem generally as constructing a Hypergraph of Text (HoT), which provides a formal mathematical structure for supporting navigation and browsing. However, how to construct and evaluate a Hypergraph of Text remains a challenge. In this paper, we propose and study several methods for constructing a HoT. We also propose a novel quantitative metric, effort ratio, for evaluating the structural quality of a constructed HoT. Experimental results show that even simple TF-IDF baselines can match LLM-based methods on our proposed effort ratio metric.