CLCYDLHCIRFeb 2, 2017

Topic Modeling the Hàn diăn Ancient Classics

arXiv:1702.00860v114 citations
Originality Synthesis-oriented
AI Analysis

This work addresses the problem of computational analysis for humanities scholars studying culturally significant ancient Chinese materials, representing an incremental application of existing methods to a new domain.

The authors tackled the challenge of analyzing over 18,000 ancient Chinese documents by applying probabilistic topic modeling to the Handian corpus, resulting in a software tool that aids in discovering and interpreting themes in these texts.

Ancient Chinese texts present an area of enormous challenge and opportunity for humanities scholars interested in exploiting computational methods to assist in the development of new insights and interpretations of culturally significant materials. In this paper we describe a collaborative effort between Indiana University and Xi'an Jiaotong University to support exploration and interpretation of a digital corpus of over 18,000 ancient Chinese documents, which we refer to as the "Handian" ancient classics corpus (Hàn diăn gŭ jí, i.e, the "Han canon" or "Chinese classics"). It contains classics of ancient Chinese philosophy, documents of historical and biographical significance, and literary works. We begin by describing the Digital Humanities context of this joint project, and the advances in humanities computing that made this project feasible. We describe the corpus and introduce our application of probabilistic topic modeling to this corpus, with attention to the particular challenges posed by modeling ancient Chinese documents. We give a specific example of how the software we have developed can be used to aid discovery and interpretation of themes in the corpus. We outline more advanced forms of computer-aided interpretation that are also made possible by the programming interface provided by our system, and the general implications of these methods for understanding the nature of meaning in these texts.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes