HCOct 31, 2017

Doris: A tool for interactive exploration of historic corpora (Extended Version)

arXiv:1711.00714v12 citations
Originality Incremental advance
AI Analysis

This work addresses the need for more effective tools for researchers analyzing historical corpora to understand social phenomena, though it is incremental as it builds on existing keyword-based methods.

The authors tackled the problem of analyzing large document corpora for social insights by extending keyword-based techniques to incorporate semantic features, resulting in an interactive tool called Doris that combines semantic features with information retrieval to enable easier exploration and insight generation, as illustrated with examples from a corpus of US presidential speeches.

Insights into social phenomenon can be gleaned from trends and patterns in corpora of documents associated with that phenomenon. Recent years have witnessed the use of computational techniques, mostly based on keywords, to analyze large corpora for these purposes. In this paper, we extend these techniques to incorporate semantic features. We introduce Doris, an interactive exploration tool that combines semantic features with information retrieval techniques to enable exploration of document corpora corresponding to the social phenomenon. We discuss the semantic techniques and describe an implementation on a corpus of United States (US) presidential speeches. We illustrate, with examples, how the ability to combine syntactic and semantic features in a visualization helps researchers more easily gain insights into the underlying phenomenon.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes