Context is Enough: Empirical Validation of $\textit{Sequentiality}$ on Essays
This provides a validated, interpretable feature for automated essay scoring, addressing a specific need in educational NLP, but it is incremental as it builds on prior critiques and proposals.
The paper tackled the validation of a context-only version of sequentiality for measuring narrative flow in essays, showing it aligns better with human assessments of discourse traits like Organization and Cohesion and adds predictive value when combined with linguistic features, outperforming zero-shot LLM predictions.
Recent work has proposed using Large Language Models (LLMs) to quantify narrative flow through a measure called sequentiality, which combines topic and contextual terms. A recent critique argued that the original results were confounded by how topics were selected for the topic-based component, and noted that the metric had not been validated against ground-truth measures of flow. That work proposed using only the contextual term as a more conceptually valid and interpretable alternative. In this paper, we empirically validate that proposal. Using two essay datasets with human-annotated trait scores, ASAP++ and ELLIPSE, we show that the contextual version of sequentiality aligns more closely with human assessments of discourse-level traits such as Organization and Cohesion. While zero-shot prompted LLMs predict trait scores more accurately than the contextual measure alone, the contextual measure adds more predictive value than both the topic-only and original sequentiality formulations when combined with standard linguistic features. Notably, this combination also outperforms the zero-shot LLM predictions, highlighting the value of explicitly modeling sentence-to-sentence flow. Our findings support the use of context-based sequentiality as a validated, interpretable, and complementary feature for automated essay scoring and related NLP tasks.