CLAug 1, 2025

Team "better_call_claude": Style Change Detection using a Sequential Sentence Pair Classifier

arXiv:2508.00675v11 citationsh-index: 7CLEF
Originality Incremental advance
AI Analysis

This addresses a challenging problem in computational authorship analysis for researchers and practitioners, though it appears incremental as it builds on previous PAN work.

The paper tackles style change detection at the sentence level by proposing a Sequential Sentence Pair Classifier (SSPC) that uses a pre-trained language model and BiLSTM to contextualize sentences. The model achieves macro-F1 scores of 0.923, 0.828, and 0.724 on EASY, MEDIUM, and HARD datasets, outperforming baselines including a zero-shot large language model.

Style change detection - identifying the points in a document where writing style shifts - remains one of the most important and challenging problems in computational authorship analysis. At PAN 2025, the shared task challenges participants to detect style switches at the most fine-grained level: individual sentences. The task spans three datasets, each designed with controlled and increasing thematic variety within documents. We propose to address this problem by modeling the content of each problem instance - that is, a series of sentences - as a whole, using a Sequential Sentence Pair Classifier (SSPC). The architecture leverages a pre-trained language model (PLM) to obtain representations of individual sentences, which are then fed into a bidirectional LSTM (BiLSTM) to contextualize them within the document. The BiLSTM-produced vectors of adjacent sentences are concatenated and passed to a multi-layer perceptron for prediction per adjacency. Building on the work of previous PAN participants classical text segmentation, the approach is relatively conservative and lightweight. Nevertheless, it proves effective in leveraging contextual information and addressing what is arguably the most challenging aspect of this year's shared task: the notorious problem of "stylistically shallow", short sentences that are prevalent in the proposed benchmark data. Evaluated on the official PAN-2025 test datasets, the model achieves strong macro-F1 scores of 0.923, 0.828, and 0.724 on the EASY, MEDIUM, and HARD data, respectively, outperforming not only the official random baselines but also a much more challenging one: claude-3.7-sonnet's zero-shot performance.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes