CLAug 12, 2013

B(eo)W(u)LF: Facilitating recurrence analysis on multi-level language

arXiv:1308.2696v11 citations
Originality Synthesis-oriented
AI Analysis

This work provides a practical tool for researchers in linguistics and discourse analysis to structure and analyze multi-level linguistic data, though it is incremental as it builds on existing data format ideas.

The authors introduced B(eo)W(u)LF, a data format for multi-level language analysis, and developed tools in Python and MATLAB to facilitate recurrence-based discourse analysis, demonstrated on 319 lines of Beowulf translated into modern English.

Discourse analysis may seek to characterize not only the overall composition of a given text but also the dynamic patterns within the data. This technical report introduces a data format intended to facilitate multi-level investigations, which we call the by-word long-form or B(eo)W(u)LF. Inspired by the long-form data format required for mixed-effects modeling, B(eo)W(u)LF structures linguistic data into an expanded matrix encoding any number of researchers-specified markers, making it ideal for recurrence-based analyses. While we do not necessarily claim to be the first to use methods along these lines, we have created a series of tools utilizing Python and MATLAB to enable such discourse analyses and demonstrate them using 319 lines of the Old English epic poem, Beowulf, translated into modern English.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes