BIRD: Bronze Inscription Restoration and Dating
This work addresses a domain-specific challenge in historical linguistics and archaeology by providing incremental improvements to inscription analysis.
The authors tackled the problem of restoring and dating fragmentary bronze inscriptions from early China by introducing the BIRD dataset and an allograph-aware masked language modeling framework with a Glyph Net, which improved restoration and dating performance.
Bronze inscriptions from early China are fragmentary and difficult to date. We introduce BIRD(Bronze Inscription Restoration and Dating), a fully encoded dataset grounded in standard scholarly transcriptions and chronological labels. We further propose an allograph-aware masked language modeling framework that integrates domain- and task-adaptive pretraining with a Glyph Net (GN), which links graphemes and allographs. Experiments show that GN improves restoration, while glyph-biased sampling yields gains in dating.