CV AIDec 2, 2024

OBI-Bench: Can LMMs Aid in Study of Ancient Script on Oracle Bones?

Zijian Chen, Tingzhu Chen, Wenjun Zhang, Guangtao Zhai

arXiv:2412.01175v217.321 citationsh-index: 10Has CodeICLR

Originality Synthesis-oriented

AI Analysis

This addresses the need for domain-specific evaluation in ancient script research, though it is incremental as it creates a new benchmark rather than advancing core model capabilities.

The researchers introduced OBI-Bench, a benchmark with 5,523 images to evaluate large multi-modal models on oracle bone inscription tasks like recognition and deciphering, finding that even top models like GPT-4o fall short of experts but match untrained humans in some areas.

We introduce OBI-Bench, a holistic benchmark crafted to systematically evaluate large multi-modal models (LMMs) on whole-process oracle bone inscriptions (OBI) processing tasks demanding expert-level domain knowledge and deliberate cognition. OBI-Bench includes 5,523 meticulously collected diverse-sourced images, covering five key domain problems: recognition, rejoining, classification, retrieval, and deciphering. These images span centuries of archaeological findings and years of research by front-line scholars, comprising multi-stage font appearances from excavation to synthesis, such as original oracle bone, inked rubbings, oracle bone fragments, cropped single characters, and handprinted characters. Unlike existing benchmarks, OBI-Bench focuses on advanced visual perception and reasoning with OBI-specific knowledge, challenging LMMs to perform tasks akin to those faced by experts. The evaluation of 6 proprietary LMMs as well as 17 open-source LMMs highlights the substantial challenges and demands posed by OBI-Bench. Even the latest versions of GPT-4o, Gemini 1.5 Pro, and Qwen-VL-Max are still far from public-level humans in some fine-grained perception tasks. However, they perform at a level comparable to untrained humans in deciphering tasks, indicating remarkable capabilities in offering new interpretative perspectives and generating creative guesses. We hope OBI-Bench can facilitate the community to develop domain-specific multi-modal foundation models towards ancient language research and delve deeper to discover and enhance these untapped potentials of LMMs.

View on arXiv PDF

Similar