Evaluating Cultural Knowledge Processing in Large Language Models: A Cognitive Benchmarking Framework Integrating Retrieval-Augmented Generation
This addresses the need for better evaluation of LLMs in cultural knowledge processing, though it appears incremental as it combines existing methods like Bloom's Taxonomy and RAG for a specific domain.
The study tackled the problem of evaluating how large language models process culturally specific knowledge by proposing a cognitive benchmarking framework that integrates Bloom's Taxonomy with Retrieval-Augmented Generation, using a Taiwanese Hakka digital cultural archive as a testbed to measure semantic accuracy and cultural relevance.
This study proposes a cognitive benchmarking framework to evaluate how large language models (LLMs) process and apply culturally specific knowledge. The framework integrates Bloom's Taxonomy with Retrieval-Augmented Generation (RAG) to assess model performance across six hierarchical cognitive domains: Remembering, Understanding, Applying, Analyzing, Evaluating, and Creating. Using a curated Taiwanese Hakka digital cultural archive as the primary testbed, the evaluation measures LLM-generated responses' semantic accuracy and cultural relevance.