SICLIROct 2, 2016

Text Network Exploration via Heterogeneous Web of Topics

arXiv:1610.00219v15 citations
Originality Incremental advance
AI Analysis

This work addresses the need for quickly understanding text networks like hyperlinked webpages or citation networks, offering a tool for researchers or analysts, though it appears incremental as it builds on existing probabilistic models.

The paper tackles the problem of text network exploration by constructing a heterogeneous web of topics, which integrates word and document levels, and demonstrates its effectiveness through qualitative analyses and good performance on objective metrics in real-life networks.

A text network refers to a data type that each vertex is associated with a text document and the relationship between documents is represented by edges. The proliferation of text networks such as hyperlinked webpages and academic citation networks has led to an increasing demand for quickly developing a general sense of a new text network, namely text network exploration. In this paper, we address the problem of text network exploration through constructing a heterogeneous web of topics, which allows people to investigate a text network associating word level with document level. To achieve this, a probabilistic generative model for text and links is proposed, where three different relationships in the heterogeneous topic web are quantified. We also develop a prototype demo system named TopicAtlas to exhibit such heterogeneous topic web, and demonstrate how this system can facilitate the task of text network exploration. Extensive qualitative analyses are included to verify the effectiveness of this heterogeneous topic web. Besides, we validate our model on real-life text networks, showing that it preserves good performance on objective evaluation metrics.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes