CLJul 11, 2017

Leipzig Corpus Miner - A Text Mining Infrastructure for Qualitative Data Analysis

arXiv:1707.03253v118 citations
Originality Synthesis-oriented
AI Analysis

This infrastructure addresses the need for scalable text analysis tools in fields like social sciences and media studies, though it is incremental as it combines existing methods.

The paper introduces the Leipzig Corpus Miner, a technical infrastructure that integrates close and distant reading methods for qualitative and quantitative content analysis, enabling analysts to apply NLP techniques to large document collections, as demonstrated in a political science study on post-democracy and neoliberalism.

This paper presents the "Leipzig Corpus Miner", a technical infrastructure for supporting qualitative and quantitative content analysis. The infrastructure aims at the integration of 'close reading' procedures on individual documents with procedures of 'distant reading', e.g. lexical characteristics of large document collections. Therefore information retrieval systems, lexicometric statistics and machine learning procedures are combined in a coherent framework which enables qualitative data analysts to make use of state-of-the-art Natural Language Processing techniques on very large document collections. Applicability of the framework ranges from social sciences to media studies and market research. As an example we introduce the usage of the framework in a political science study on post-democracy and neoliberalism.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes