CLNov 3, 2025

Evaluating Cultural Knowledge Processing in Large Language Models: A Cognitive Benchmarking Framework Integrating Retrieval-Augmented Generation

Hung-Shin Lee, Chen-Chi Chang, Ching-Yuan Chen, Yun-Hsiang Hsu

arXiv:2511.01649v14.91 citationsh-index: 2Electronic library

Originality Synthesis-oriented

AI Analysis

This addresses the need for better evaluation of LLMs in cultural knowledge processing, though it appears incremental as it combines existing methods like Bloom's Taxonomy and RAG for a specific domain.

The study tackled the problem of evaluating how large language models process culturally specific knowledge by proposing a cognitive benchmarking framework that integrates Bloom's Taxonomy with Retrieval-Augmented Generation, using a Taiwanese Hakka digital cultural archive as a testbed to measure semantic accuracy and cultural relevance.

This study proposes a cognitive benchmarking framework to evaluate how large language models (LLMs) process and apply culturally specific knowledge. The framework integrates Bloom's Taxonomy with Retrieval-Augmented Generation (RAG) to assess model performance across six hierarchical cognitive domains: Remembering, Understanding, Applying, Analyzing, Evaluating, and Creating. Using a curated Taiwanese Hakka digital cultural archive as the primary testbed, the evaluation measures LLM-generated responses' semantic accuracy and cultural relevance.

View on arXiv PDF

Similar