Neurobiber: Fast and Interpretable Stylistic Feature Extraction
This work addresses the problem of scalable and interpretable stylistic analysis for researchers and practitioners in NLP, forensic analysis, and text monitoring, though it is incremental as it builds on existing Biber's MDA framework with efficiency improvements.
The paper tackled the challenge of extracting detailed stylistic features at scale by introducing Neurobiber, a transformer-based system that predicts 96 Biber-style features, achieving up to 56 times faster performance than existing open-source systems while replicating classic MDA insights and showing competitive results on the PAN 2020 authorship verification task.
Linguistic style is pivotal for understanding how texts convey meaning and fulfill communicative purposes, yet extracting detailed stylistic features at scale remains challenging. We present Neurobiber, a transformer-based system for fast, interpretable style profiling built on Biber's Multidimensional Analysis (MDA). Neurobiber predicts 96 Biber-style features from our open-source BiberPlus library (a Python toolkit that computes stylistic features and provides integrated analytics, e.g., PCA and factor analysis). Despite being up to 56 times faster than existing open source systems, Neurobiber replicates classic MDA insights on the CORE corpus and achieves competitive performance on the PAN 2020 authorship verification task without extensive retraining. Its efficient and interpretable representations readily integrate into downstream NLP pipelines, facilitating large-scale stylometric research, forensic analysis, and real-time text monitoring. All components are made publicly available.