CVMay 7

Understanding Cross-Language Transfer Improvements in Low-Resource HTR: The Role of Sequence Modeling

arXiv:2605.0590054.6h-index: 6
Predicted impact top 65% in CV · last 90 daysOriginality Incremental advance
AI Analysis

This work clarifies the mechanism behind cross-language transfer in HTR for Arabic-script languages, which is important for improving low-resource recognition systems.

The paper investigates whether cross-language transfer improvements in low-resource handwritten text recognition are due to shared visual representations or sequence-level dependencies. Controlled experiments show that CRNN models benefit from multi-script training, while CNN-only models do not, indicating that sequence modeling is key for effective transfer.

Handwritten Text Recognition (HTR) for Arabic-script languages benefits from cross-language joint training under low-resource conditions, particularly when using CRNN-based models that combine convolutional encoders with sequence modeling. However, it remains unclear whether these improvements are better explained by shared visual representations or sequence-level dependencies. In this work, we conduct a controlled architectural study of line-level Arabic-script HTR, comparing CNN-only models with CTC decoding and CRNN models under identical single-script and multi-script training regimes. Experiments are performed on Arabic (KHATT), Urdu (NUST-UHWR), and Persian (PHTD) datasets under low-resource settings (K in {100, 500, 1000}). Our results show a clear divergence in transfer behavior: while CNN-only models exhibit limited or unstable improvements, CRNN models achieve better performance under multi-script training, particularly in the most data-constrained regimes. Focusing on transfer improvements (delta CER) rather than absolute performance, we find that cross-language improvements are associated with sequence-level modeling, while sharing visual representations learned by the CNN encoder, corresponding to similarities in character shapes across scripts, alone appears to be insufficient. This finding suggests that contextual modeling plays an important role in enabling effective transfer in low-resource scenarios, and that similar behavior may extend to other low-resource language settings.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes