CL AI LGJan 12

From Word Sequences to Behavioral Sequences: Adapting Modeling and Evaluation Paradigms for Longitudinal NLP

Adithya V Ganesan, Vasudha Varadarajan, Oscar NE Kjell, Whitney R Ringwald, Scott Feltman, Benjamin J Luft, Roman Kotov, Ryan L Boyd, H Andrew Schwartz

arXiv:2601.07988v11.1

Originality Highly original

AI Analysis

This addresses the need for ecologically valid NLP in longitudinal research, such as mental health studies, by shifting from word-sequence to behavior-sequence paradigms, representing a foundational advancement rather than an incremental change.

The paper tackles the problem of NLP models incorrectly assuming documents are independent in longitudinal studies, proposing a new paradigm that updates evaluation splits, metrics, inputs, and model internals to handle behavioral sequences, and demonstrates on a dataset of 17k daily diary transcripts that traditional methods can yield substantially different or reversed conclusions.

While NLP typically treats documents as independent and unordered samples, in longitudinal studies, this assumption rarely holds: documents are nested within authors and ordered in time, forming person-indexed, time-ordered $\textit{behavioral sequences}$. Here, we demonstrate the need for and propose a longitudinal modeling and evaluation paradigm that consequently updates four parts of the NLP pipeline: (1) evaluation splits aligned to generalization over people ($\textit{cross-sectional}$) and/or time ($\textit{prospective}$); (2) accuracy metrics separating between-person differences from within-person dynamics; (3) sequence inputs to incorporate history by default; and (4) model internals that support different $\textit{coarseness}$ of latent state over histories (pooled summaries, explicit dynamics, or interaction-based models). We demonstrate the issues ensued by traditional pipeline and our proposed improvements on a dataset of 17k daily diary transcripts paired with PTSD symptom severity from 238 participants, finding that traditional document-level evaluation can yield substantially different and sometimes reversed conclusions compared to our ecologically valid modeling and evaluation. We tie our results to a broader discussion motivating a shift from word-sequence evaluation toward $\textit{behavior-sequence}$ paradigms for NLP.

View on arXiv PDF

Similar