CVJul 9, 2025

MCCD: A Multi-Attribute Chinese Calligraphy Character Dataset Annotated with Script Styles, Dynasties, and Calligraphers

arXiv:2507.06948v110.23 citationsh-index: 4Has CodeICDAR

Originality Synthesis-oriented

AI Analysis

This provides a detailed dataset for researchers in cultural heritage and computer vision, enabling studies on calligraphy recognition and evolution, but it is incremental as it focuses on data collection rather than novel methods.

The authors tackled the problem of limited datasets for Chinese calligraphy research by creating MCCD, a multi-attribute dataset with 329,715 character images annotated with script styles, dynasties, and calligraphers, which they used to benchmark recognition tasks and found increased difficulty due to stroke complexity and attribute interplay.

Research on the attribute information of calligraphy, such as styles, dynasties, and calligraphers, holds significant cultural and historical value. However, the styles of Chinese calligraphy characters have evolved dramatically through different dynasties and the unique touches of calligraphers, making it highly challenging to accurately recognize these different characters and their attributes. Furthermore, existing calligraphic datasets are extremely scarce, and most provide only character-level annotations without additional attribute information. This limitation has significantly hindered the in-depth study of Chinese calligraphy. To fill this gap, we present a novel Multi-Attribute Chinese Calligraphy Character Dataset (MCCD). The dataset encompasses 7,765 categories with a total of 329,715 isolated image samples of Chinese calligraphy characters, and three additional subsets were extracted based on the attribute labeling of the three types of script styles (10 types), dynasties (15 periods) and calligraphers (142 individuals). The rich multi-attribute annotations render MCCD well-suited diverse research tasks, including calligraphic character recognition, writer identification, and evolutionary studies of Chinese characters. We establish benchmark performance through single-task and multi-task recognition experiments across MCCD and all of its subsets. The experimental results demonstrate that the complexity of the stroke structure of the calligraphic characters, and the interplay between their different attributes, leading to a substantial increase in the difficulty of accurate recognition. MCCD not only fills a void in the availability of detailed calligraphy datasets but also provides valuable resources for advancing research in Chinese calligraphy and fostering advancements in multiple fields. The dataset is available at https://github.com/SCUT-DLVCLab/MCCD.

View on arXiv PDF Code

Similar