Bridging the Gap between Expert and Language Models: Concept-guided Chess Commentary Generation and Evaluation
This addresses the need for explainable AI in decision-making domains like chess, benefiting model transparency and human education, though it is incremental as it builds on existing expert and language models.
The paper tackled the problem of generating interpretable chess commentary by bridging expert models' decision-making with LLMs' linguistic fluency, resulting in a method that produces accurate, informative, and fluent commentary as validated by human judges and their evaluation metric.
Deep learning-based expert models have reached superhuman performance in decision-making domains such as chess and Go. However, it is under-explored to explain or comment on given decisions although it is important for model explainability and human education. The outputs of expert models are accurate, but yet difficult to interpret for humans. On the other hand, large language models (LLMs) can produce fluent commentary but are prone to hallucinations due to their limited decision-making capabilities. To bridge this gap between expert models and LLMs, we focus on chess commentary as a representative task of explaining complex decision-making processes through language and address both the generation and evaluation of commentary. We introduce Concept-guided Chess Commentary generation (CCC) for producing commentary and GPT-based Chess Commentary Evaluation (GCC-Eval) for assessing it. CCC integrates the decision-making strengths of expert models with the linguistic fluency of LLMs through prioritized, concept-based explanations. GCC-Eval leverages expert knowledge to evaluate chess commentary based on informativeness and linguistic quality. Experimental results, validated by both human judges and GCC-Eval, demonstrate that CCC generates commentary which is accurate, informative, and fluent.