DLIRSep 3, 2014

Improving Accessibility of Archived Raster Dictionaries of Complex Script Languages

arXiv:1409.1284v14 citations
Originality Synthesis-oriented
AI Analysis

This improves accessibility for users of archived dictionaries, especially in complex script languages, though it is incremental as it builds on existing indexing and crowdsourcing techniques.

The authors tackled the problem of making raster images of dictionary pages searchable with minimal manual effort, resulting in a web application that enables a single person to index a 1,000-page dictionary in under an hour.

We propose an approach to index raster images of dictionary pages which in turn would require very little manual effort to enable direct access to the appropriate pages of the dictionary for lookup. Accessibility is further improved by feedback and crowdsourcing that enables highlighting of the specific location on the page where the lookup word is found, annotation, digitization, and fielded searching. This approach is equally applicable on simple scripts as well as complex writing systems. Using our proposed approach, we have built a Web application called "Dictionary Explorer" which supports word indexes in various languages and every language can have multiple dictionaries associated with it. Word lookup gives direct access to appropriate pages of all the dictionaries of that language simultaneously. The application has exploration features like searching, pagination, and navigating the word index through a tree-like interface. The application also supports feedback, annotation, and digitization features. Apart from the scanned images, "Dictionary Explorer" aggregates results from various sources and user contributions in Unicode. We have evaluated the time required for indexing dictionaries of different sizes and complexities in the Urdu language and examined various trade-offs in our implementation. Using our approach, a single person can make a dictionary of 1,000 pages searchable in less than an hour.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes