Idiosyncratic but not Arbitrary: Learning Idiolects in Online Registers Reveals Distinctive yet Consistent Individual Styles
This work addresses the challenge of understanding idiosyncratic writing styles in online registers, which is incremental as it builds on existing social variation studies by focusing on personal attributes.
The paper tackles the problem of characterizing individual writing styles (idiolects) by introducing a neural model for authorship identification on short texts, achieving strong performance and revealing regularities in stylistic features through probing tasks and text perturbation analysis.
An individual's variation in writing style is often a function of both social and personal attributes. While structured social variation has been extensively studied, e.g., gender based variation, far less is known about how to characterize individual styles due to their idiosyncratic nature. We introduce a new approach to studying idiolects through a massive cross-author comparison to identify and encode stylistic features. The neural model achieves strong performance at authorship identification on short texts and through an analogy-based probing task, showing that the learned representations exhibit surprising regularities that encode qualitative and quantitative shifts of idiolectal styles. Through text perturbation, we quantify the relative contributions of different linguistic elements to idiolectal variation. Furthermore, we provide a description of idiolects through measuring inter- and intra-author variation, showing that variation in idiolects is often distinctive yet consistent.