CVJun 3
Handwriting Extraction and Analysis of Signature Lists in Swiss Popular InitiativesMarco Peer, Thomas Gorges, Mathias Seuret et al.
Popular initiatives and referendums are central to Swiss democracy, yet the validation of handwritten signature lists remains a labor-intensive manual process. This paper investigates the potential of automated document analysis methods, including OCR and AI-based handwriting analysis, to support this task. We propose a pipeline combining template-based line segmentation with text recognition and writer retrieval techniques, evaluated on a dataset of 443 handwritten entries from 418 writers. Results show that OCR struggles with out-of-vocabulary handwriting, with a CER of 29.6% for first names. In contrast, writer retrieval performs more robustly, reaching an mAP of 50.6%. Furthermore, our experiments indicate that off-the-shelf OCR systems are not sufficiently reliable for transcription of handwritten signature data, particularly for short, out-of-vocabulary entries such as names or addresses. However, writer retrieval methods can effectively identify visually similar entries across signature lists, making them a suitable tool for supporting the detection of potential duplicate submissions based on handwriting similarity.
CVFeb 19
DRetHTR: Linear-Time Decoder-Only Retentive Network for Handwritten Text RecognitionChanghun Kim, Martin Mayr, Thomas Gorges et al.
State-of-the-art handwritten text recognition (HTR) systems commonly use Transformers, whose growing key-value (KV) cache makes decoding slow and memory-intensive. We introduce DRetHTR, a decoder-only model built on Retentive Networks (RetNet). Compared to an equally sized decoder-only Transformer baseline, DRetHTR delivers 1.6-1.9x faster inference with 38-42% less memory usage, without loss of accuracy. By replacing softmax attention with softmax-free retention and injecting multi-scale sequential priors, DRetHTR avoids a growing KV cache: decoding is linear in output length in both time and memory. To recover the local-to-global inductive bias of attention, we propose layer-wise gamma scaling, which progressively enlarges the effective retention horizon in deeper layers. This encourages early layers to model short-range dependencies and later layers to capture broader context, mitigating the flexibility gap introduced by removing softmax. Consequently, DRetHTR achieves best reported test character error rates of 2.26% (IAM-A, en), 1.81% (RIMES, fr), and 3.46% (Bentham, en), and is competitive on READ-2016 (de) with 4.21%. This demonstrates that decoder-only RetNet enables Transformer-level HTR accuracy with substantially improved decoding speed and memory efficiency.
CVMay 8
ICDAR 2026 Competition on Writer Identification and Pen Classification from Hand-Drawn CirclesThomas Gorges, Janne van der Loop, Lukas Hüttner et al.
This paper presents CircleID, a large-scale ICDAR 2026 competition on writer identification and pen classification from scanned hand-drawn circles. The primary objective is to investigate how biometric writer characteristics and physical pen features naturally entangle within minimal, static traces. CircleID comprises two distinct tasks: (1) open-set writer identification, requiring models to recognize known writers while explicitly rejecting unknown ones, and (2) cross-writer pen classification, evaluated across both seen and unseen writers. Participants were provided with a new, controlled dataset of 46,155 tightly cropped circle images, digitized at 400 DPI and annotated for writer identity and pen type. The dataset comprises samples from 50 known and 16 unknown writers using eight different pens. Hosted on Kaggle as two separate tracks with public and private leaderboards, the competition provided participants with a ResNet baseline. In total, 389 teams (436 participants) made 3,185 submissions for the pen classification task, and 113 teams (141 participants) made 1,737 submissions for the writer identification track. The best-performing private leaderboard submissions achieved a Top-1 accuracy of 64.801% for writer identification and 92.726% for pen classification. This paper details the dataset, evaluates the winning methodologies, and analyzes the impact of out-of-distribution writers on model generalization and feature disentanglement. In this large-scale competition, CircleID establishes a new baseline for minimal-trace analysis.
ASMay 1, 2025
Perceptual Implications of Automatic Anonymization in Pathological SpeechSoroosh Tayebi Arasteh, Saba Afza, Tri-Thien Nguyen et al.
Automatic anonymization techniques are essential for ethical sharing of pathological speech data, yet their perceptual consequences remain understudied. We present a comprehensive human-centered analysis of anonymized pathological speech, using a structured protocol involving ten native and non-native German listeners with diverse linguistic, clinical, and technical backgrounds. Listeners evaluated anonymized-original utterance pairs from 180 speakers spanning Cleft Lip and Palate, Dysarthria, Dysglossia, Dysphonia, and healthy controls. Speech was anonymized using state-of-the-art automatic methods (equal error rates in the range of 30-40%). Listeners completed Turing-style discrimination and quality rating tasks under zero-shot (single-exposure) and few-shot (repeated-exposure) conditions. Discrimination accuracy was high overall (91% zero-shot; 93% few-shot), but varied by disorder (repeated-measures ANOVA: p=0.007), ranging from 96% (Dysarthria) to 86% (Dysphonia). Anonymization consistently reduced perceived quality across groups (from 83% to 59%, p<0.001), with pathology-specific degradation patterns (one-way ANOVA: p=0.005). Native listeners showed a non-significant trend toward higher original speech ratings (Delta=4%, p=0.199), but this difference was minimal after anonymization (Delta=1%, p=0.724). No significant gender-based bias was observed. Perceptual outcomes did not correlate with automatic metrics; intelligibility was linked to perceived quality in original speech but not after anonymization. These findings underscore the need for listener-informed, disorder-specific anonymization strategies that preserve both privacy and perceptual integrity.