CVMar 5, 2025

DongbaMIE: A Multimodal Information Extraction Dataset for Evaluating Semantic Understanding of Dongba Pictograms

arXiv:2503.03644v45 citationsh-index: 6EMNLP
Originality Synthesis-oriented
AI Analysis

This addresses the lack of datasets for Dongba pictograms, a unique cultural script, enabling research in multimodal semantic understanding, though it is incremental as it primarily provides a new dataset.

The authors tackled the problem of semantic understanding of Dongba pictograms by constructing DongbaMIE, the first multimodal information extraction dataset, which contains 23,530 sentence-level and 2,539 paragraph-level text-image pairs, and found that mainstream models struggle with efficient information extraction under zero-shot and few-shot learning, with supervised fine-tuning offering limited improvement.

Dongba pictographic is the only pictographic script still in use in the world. Its pictorial ideographic features carry rich cultural and contextual information. However, due to the lack of relevant datasets, research on semantic understanding of Dongba hieroglyphs has progressed slowly. To this end, we constructed \textbf{DongbaMIE} - the first dataset focusing on multimodal information extraction of Dongba pictographs. The dataset consists of images of Dongba hieroglyphic characters and their corresponding semantic annotations in Chinese. It contains 23,530 sentence-level and 2,539 paragraph-level high-quality text-image pairs. The annotations cover four semantic dimensions: object, action, relation and attribute. Systematic evaluation of mainstream multimodal large language models shows that the models are difficult to perform information extraction of Dongba hieroglyphs efficiently under zero-shot and few-shot learning. Although supervised fine-tuning can improve the performance, accurate extraction of complex semantics is still a great challenge at present.

Code Implementations1 repo
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes