CL AIAug 17, 2023

CMB: A Comprehensive Medical Benchmark in Chinese

Xidong Wang, Guiming Hardy Chen, Dingjie Song, Zhiyi Zhang, Zhihong Chen, Qingying Xiao, Feng Jiang, Jianquan Li, Xiang Wan, Benyou Wang, Haizhou Li

arXiv:2308.08833v221.6157 citationsh-index: 30Has Code

Originality Synthesis-oriented

AI Analysis

This addresses the problem of contextual incongruities in medical AI evaluation for Chinese healthcare, though it is incremental as it adapts existing benchmark concepts to a specific region.

The authors tackled the lack of a localized medical benchmark for Chinese contexts by creating CMB, a Comprehensive Medical Benchmark in Chinese, which includes traditional Chinese medicine and evaluates models like ChatGPT and GPT-4, showing performance gaps but without specific numerical results.

Large Language Models (LLMs) provide a possibility to make a great breakthrough in medicine. The establishment of a standardized medical benchmark becomes a fundamental cornerstone to measure progression. However, medical environments in different regions have their local characteristics, e.g., the ubiquity and significance of traditional Chinese medicine within China. Therefore, merely translating English-based medical evaluation may result in \textit{contextual incongruities} to a local region. To solve the issue, we propose a localized medical benchmark called CMB, a Comprehensive Medical Benchmark in Chinese, designed and rooted entirely within the native Chinese linguistic and cultural framework. While traditional Chinese medicine is integral to this evaluation, it does not constitute its entirety. Using this benchmark, we have evaluated several prominent large-scale LLMs, including ChatGPT, GPT-4, dedicated Chinese LLMs, and LLMs specialized in the medical domain. We hope this benchmark provide first-hand experience in existing LLMs for medicine and also facilitate the widespread adoption and enhancement of medical LLMs within China. Our data and code are publicly available at https://github.com/FreedomIntelligence/CMB.

View on arXiv PDF Code

Similar