CVMar 3

ScribeTokens: Fixed-Vocabulary Tokenization of Digital Ink

arXiv:2603.02805v11.5
Originality Highly original
AI Analysis

This work addresses the problem of digital ink representation for handwritten text recognition and generation, providing a solution for researchers and developers working on handwriting-based human-computer interaction systems.

The authors tackled the problem of representing digital ink and achieved a character error rate of 8.27% on the IAM dataset and 9.83% on the DeepWriting dataset using their proposed ScribeTokens method, outperforming vector representations. This resulted in a significant improvement in handwritten text generation, with a character error rate of 17.33% compared to 70.29% for vectors.

Digital ink -- the coordinate stream captured from stylus or touch input -- lacks a unified representation. Continuous vector representations produce long sequences and suffer from training instability, while existing token representations require large vocabularies, face out-of-vocabulary issues, and underperform vectors on recognition. We propose ScribeTokens, a tokenization that decomposes pen movement into unit pixel steps. Together with two pen-state tokens, this fixed 10-token base vocabulary suffices to represent any digital ink and enables aggressive BPE compression. On handwritten text generation, ScribeTokens dramatically outperforms vectors (17.33% vs. 70.29% CER), showing tokens are far more effective for generation. On recognition, ScribeTokens is the only token representation to outperform vectors without pretraining. We further introduce next-ink-token prediction as a self-supervised pretraining strategy, which consistently improves recognition across all token-based models and accelerates convergence by up to 83x. With pretraining, ScribeTokens achieves the best recognition results across all representations on both datasets (8.27% CER on IAM, 9.83% on DeepWriting).

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes