CL AISep 3, 2024

Benchmarking Cognitive Domains for LLMs: Insights from Taiwanese Hakka Culture

Chen-Chi Chang, Ching-Yuan Chen, Hung-Shin Lee, Chih-Cheng Lee

arXiv:2409.01556v24.87 citationsh-index: 2

Originality Synthesis-oriented

AI Analysis

This provides a tool for assessing LLMs in culturally diverse contexts, aiding AI-driven cultural preservation, though it is incremental as it applies existing methods to a new cultural dataset.

The study tackled the problem of evaluating large language models' ability to handle cultural knowledge by creating a benchmark based on Hakka culture and Bloom's Taxonomy, finding that Retrieval-Augmented Generation (RAG) improved accuracy across cognitive domains but had limitations in creative tasks.

This study introduces a comprehensive benchmark designed to evaluate the performance of large language models (LLMs) in understanding and processing cultural knowledge, with a specific focus on Hakka culture as a case study. Leveraging Bloom's Taxonomy, the study develops a multi-dimensional framework that systematically assesses LLMs across six cognitive domains: Remembering, Understanding, Applying, Analyzing, Evaluating, and Creating. This benchmark extends beyond traditional single-dimensional evaluations by providing a deeper analysis of LLMs' abilities to handle culturally specific content, ranging from basic recall of facts to higher-order cognitive tasks such as creative synthesis. Additionally, the study integrates Retrieval-Augmented Generation (RAG) technology to address the challenges of minority cultural knowledge representation in LLMs, demonstrating how RAG enhances the models' performance by dynamically incorporating relevant external information. The results highlight the effectiveness of RAG in improving accuracy across all cognitive domains, particularly in tasks requiring precise retrieval and application of cultural knowledge. However, the findings also reveal the limitations of RAG in creative tasks, underscoring the need for further optimization. This benchmark provides a robust tool for evaluating and comparing LLMs in culturally diverse contexts, offering valuable insights for future research and development in AI-driven cultural knowledge preservation and dissemination.

View on arXiv PDF

Similar