Dynamic Natural Language Processing with Recurrence Quantification Analysis
This work addresses the need for dynamic analysis in natural language processing, offering a novel approach for researchers in computational linguistics, but it is incremental as it adapts an existing technique from dynamical systems to text.
The authors tackled the problem of analyzing text as a dynamic process by applying recurrence quantification analysis (RQA) to treat text as a time series, resulting in a method that extracts multiple measures in a common framework and shows relationships with existing NLP measures, with an example analysis on 8,000 texts from the Gutenberg Project.
Writing and reading are dynamic processes. As an author composes a text, a sequence of words is produced. This sequence is one that, the author hopes, causes a revisitation of certain thoughts and ideas in others. These processes of composition and revisitation by readers are ordered in time. This means that text itself can be investigated under the lens of dynamical systems. A common technique for analyzing the behavior of dynamical systems, known as recurrence quantification analysis (RQA), can be used as a method for analyzing sequential structure of text. RQA treats text as a sequential measurement, much like a time series, and can thus be seen as a kind of dynamic natural language processing (NLP). The extension has several benefits. Because it is part of a suite of time series analysis tools, many measures can be extracted in one common framework. Secondly, the measures have a close relationship with some commonly used measures from natural language processing. Finally, using recurrence analysis offers an opportunity expand analysis of text by developing theoretical descriptions derived from complex dynamic systems. We showcase an example analysis on 8,000 texts from the Gutenberg Project, compare it to well-known NLP approaches, and describe an R package (crqanlp) that can be used in conjunction with R library crqa.