CVCLJun 2, 2024

Deciphering Oracle Bone Language with Diffusion Models

arXiv:2406.00684v334 citationsHas Code
Originality Incremental advance
AI Analysis

This work addresses the challenge of deciphering ancient languages for historians and linguists, representing an incremental advance by applying modern AI techniques to a domain-specific problem.

The paper tackled the problem of deciphering Oracle Bone Script, an ancient Chinese language with limited textual data, by developing a diffusion model called OBSD that generates clues for decipherment, achieving quantitative effectiveness in experiments.

Originating from China's Shang Dynasty approximately 3,000 years ago, the Oracle Bone Script (OBS) is a cornerstone in the annals of linguistic history, predating many established writing systems. Despite the discovery of thousands of inscriptions, a vast expanse of OBS remains undeciphered, casting a veil of mystery over this ancient language. The emergence of modern AI technologies presents a novel frontier for OBS decipherment, challenging traditional NLP methods that rely heavily on large textual corpora, a luxury not afforded by historical languages. This paper introduces a novel approach by adopting image generation techniques, specifically through the development of Oracle Bone Script Decipher (OBSD). Utilizing a conditional diffusion-based strategy, OBSD generates vital clues for decipherment, charting a new course for AI-assisted analysis of ancient languages. To validate its efficacy, extensive experiments were conducted on an oracle bone script dataset, with quantitative results demonstrating the effectiveness of OBSD. Code and decipherment results will be made available at https://github.com/guanhaisu/OBSD.

Code Implementations1 repo
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes