CVMar 17, 2025

Historic Scripts to Modern Vision: A Novel Dataset and A VLM Framework for Transliteration of Modi Script to Devanagari

arXiv:2503.13060v21 citationsh-index: 8ICDAR
Originality Incremental advance
AI Analysis

This work addresses the preservation of 40 million deteriorating medieval Indian documents by enabling automated transliteration, which is incremental as it builds on existing OCR methods but focuses on a specific script conversion task.

The authors tackled the problem of transliterating historical Modi script documents to Devanagari by introducing the MoDeTrans dataset with 2,043 images and the MoScNet VLM framework, which uses knowledge distillation to achieve better performance with 163× fewer parameters than the teacher model.

In medieval India, the Marathi language was written using the Modi script. The texts written in Modi script include extensive knowledge about medieval sciences, medicines, land records and authentic evidence about Indian history. Around 40 million documents are in poor condition and have not yet been transliterated. Furthermore, only a few experts in this domain can transliterate this script into English or Devanagari. Most of the past research predominantly focuses on individual character recognition. A system that can transliterate Modi script documents to Devanagari script is needed. We propose the MoDeTrans dataset, comprising 2,043 images of Modi script documents accompanied by their corresponding textual transliterations in Devanagari. We further introduce MoScNet (\textbf{Mo}di \textbf{Sc}ript \textbf{Net}work), a novel Vision-Language Model (VLM) framework for transliterating Modi script images into Devanagari text. MoScNet leverages Knowledge Distillation, where a student model learns from a teacher model to enhance transliteration performance. The final student model of MoScNet has better performance than the teacher model while having 163$\times$ lower parameters. Our work is the first to perform direct transliteration from the handwritten Modi script to the Devanagari script. MoScNet also shows competitive results on the optical character recognition (OCR) task.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes