Perspectives - Interactive Document Clustering in the Discourse Analysis Tool Suite
This addresses the challenge for Digital Humanities scholars in analyzing unstructured document collections, though it appears incremental as an extension of an existing tool suite.
The paper tackles the problem of exploring and organizing large unstructured document collections for Digital Humanities scholars by introducing Perspectives, an interactive document clustering extension with human-in-the-loop refinement capabilities, which enables users to uncover topics and sentiments through a flexible pipeline.
This paper introduces Perspectives, an interactive extension of the Discourse Analysis Tool Suite designed to empower Digital Humanities (DH) scholars to explore and organize large, unstructured document collections. Perspectives implements a flexible, aspect-focused document clustering pipeline with human-in-the-loop refinement capabilities. We showcase how this process can be initially steered by defining analytical lenses through document rewriting prompts and instruction-based embeddings, and further aligned with user intent through tools for refining clusters and mechanisms for fine-tuning the embedding model. The demonstration highlights a typical workflow, illustrating how DH researchers can leverage Perspectives's interactive document map to uncover topics, sentiments, or other relevant categories, thereby gaining insights and preparing their data for subsequent in-depth analysis.