A Structural Query System for Han Characters
This work addresses the needs of font developers and foreign language learners by providing a specialized tool for querying Han character structures, representing an incremental improvement over existing software.
The authors tackled the problem of efficiently querying Han character dictionaries by developing IDSgrep, a structural query system that includes a data model, query language, and bit vector index, achieving faster query operations as demonstrated in experimental comparisons.
The IDSgrep structural query system for Han character dictionaries is presented. This system includes a data model and syntax for describing the spatial structure of Han characters using Extended Ideographic Description Sequences (EIDSes) based on the Unicode IDS syntax; a language for querying EIDS databases, designed to suit the needs of font developers and foreign language learners; a bit vector index inspired by Bloom filters for faster query operations; a freely available implementation; and format translation from popular third-party IDS and XML character databases. Experimental results are included, with a comparison to other software used for similar applications.