Crystallizing Schemas with Teleoscope: Thematic Curation of Large Text Corpora on Reddit
For qualitative researchers, Teleoscope addresses the bottleneck of making large text corpora tractable for interpretivist analysis, enabling methodologically coherent workflows.
Teleoscope is a web-based interface for thematic curation of large text corpora like Reddit, enabling iterative, interactive, and reflexive refinement. Deployments showed it supports serendipitous keyword discovery, increases confidence in search saturation, and aids collaborative discussion of curation pathways.
Large text corpora, such as Reddit posts, have become an increasingly prevalent site of qualitative inquiry. However, most large text corpora are intractable for qualitative researchers. Instead, teams rely on statistical subsampling to reduce corpora to a manageable size for qualitative analysis. While previous work for navigating large corpora involves visualizing the dataset at the corpus-level using high-level statistical summaries, few systems offer the ability to curate data using an interpretivist approach. To address this, we developed Teleoscope, a web-based interface designed to scaffold iterative, interactive, and reflexive refinement of a large corpus, in a process we call thematic curation. Across three deployments, we learned that Teleoscope supports serendipitous discovery of new keywords, results in greater feelings of confidence in search saturation, and aids collaborative discussion of alternative curation pathways. Teleoscope empowers researchers to stay "close to the data" in order to make qualitative workflows methodologically coherent with large text corpora.