IRCLMay 11, 2018

iLCM - A Virtual Research Infrastructure for Large-Scale Qualitative Data

arXiv:1805.11404v11088 citations
Originality Synthesis-oriented
AI Analysis

This addresses the need for reproducible and scalable text analysis tools in computational social science and digital humanities, though it is incremental as it builds on existing components.

The iLCM project developed a virtual research infrastructure to analyze large-scale qualitative data, integrating a decentralized SaaS application for text mining with an open research computing environment for reproducible notebooks, enabling high-performance processing and customization for social sciences and digital humanities.

The iLCM project pursues the development of an integrated research environment for the analysis of structured and unstructured data in a "Software as a Service" architecture (SaaS). The research environment addresses requirements for the quantitative evaluation of large amounts of qualitative data with text mining methods as well as requirements for the reproducibility of data-driven research designs in the social sciences. For this, the iLCM research environment comprises two central components. First, the Leipzig Corpus Miner (LCM), a decentralized SaaS application for the analysis of large amounts of news texts developed in a previous Digital Humanities project. Second, the text mining tools implemented in the LCM are extended by an "Open Research Computing" (ORC) environment for executable script documents, so-called "notebooks". This novel integration allows to combine generic, high-performance methods to process large amounts of unstructured text data and with individual program scripts to address specific research requirements in computational social science and digital humanities.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes