CVAug 28, 2018

A Unified Multilingual Handwriting Recognition System using multigrams sub-lexical units

arXiv:1808.09183v19 citations
Originality Incremental advance
AI Analysis

This addresses the challenge of creating efficient multilingual handwriting recognition systems for applications requiring support across multiple languages, though it is incremental as it builds on existing unified optical models.

The paper tackles the problem of designing a unified multilingual handwriting recognition system by using sub-lexical multigrams to reduce lexicon size and complexity, achieving state-of-the-art performance with a strong reduction in complexity.

We address the design of a unified multilingual system for handwriting recognition. Most of multi- lingual systems rests on specialized models that are trained on a single language and one of them is selected at test time. While some recognition systems are based on a unified optical model, dealing with a unified language model remains a major issue, as traditional language models are generally trained on corpora composed of large word lexicons per language. Here, we bring a solution by con- sidering language models based on sub-lexical units, called multigrams. Dealing with multigrams strongly reduces the lexicon size and thus decreases the language model complexity. This makes pos- sible the design of an end-to-end unified multilingual recognition system where both a single optical model and a single language model are trained on all the languages. We discuss the impact of the language unification on each model and show that our system reaches state-of-the-art methods perfor- mance with a strong reduction of the complexity.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes