CVDec 21, 2023

Diff-Oracle: Deciphering Oracle Bone Scripts with Controllable Diffusion Model

arXiv:2312.13631v22 citationsh-index: 9
Originality Incremental advance
AI Analysis

This addresses a domain-specific problem in Chinese archaeology and philology by improving oracle script recognition, representing a novel method for a known bottleneck rather than a foundational advancement.

The paper tackles the challenge of deciphering oracle bone scripts by proposing Diff-Oracle, a controllable diffusion model that generates diverse oracle characters using style and content encoders, resulting in a 7.70% accuracy gain in zero-shot recognition on the OBC306 dataset and achieving 84.62% accuracy on unseen images.

Deciphering oracle bone scripts plays an important role in Chinese archaeology and philology. However, a significant challenge remains due to the scarcity of oracle character images. To overcome this issue, we propose Diff-Oracle, a novel approach based on diffusion models to generate a diverse range of controllable oracle characters. Unlike traditional diffusion models that operate primarily on text prompts, Diff-Oracle incorporates a style encoder that utilizes style reference images to control the generation style. This encoder extracts style prompts from existing oracle character images, where style details are converted into a text embedding format via a pretrained language-vision model. On the other hand, a content encoder is integrated within Diff-Oracle to capture specific content details from content reference images, ensuring that the generated characters accurately represent the intended glyphs. To effectively train Diff-Oracle, we pre-generate pixel-level paired oracle character images (i.e., style and content images) by an image-to-image translation model. Extensive qualitative and quantitative experiments are conducted on datasets Oracle-241 and OBC306. While significantly surpassing present generative methods in terms of image generation, Diff-Oracle substantially benefits downstream oracle character recognition, outperforming all existing SOTAs by a large margin. In particular, on the challenging OBC306 dataset, Diff-Oracle leads to an accuracy gain of 7.70% in the zero-shot setting and is able to recognize unseen oracle character images with the accuracy of 84.62%, achieving a new benchmark for deciphering oracle bone scripts.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes