CL AIJan 19, 2025

InsQABench: Benchmarking Chinese Insurance Domain Question Answering with Large Language Models

Jing Ding, Kai Feng, Binbin Lin, Jiarui Cai, Qiushi Wang, Yu Xie, Xiaojin Zhang, Zhongyu Wei, Wei Chen

arXiv:2501.10943v16.71 citationsh-index: 47Has Code

Originality Synthesis-oriented

AI Analysis

This work addresses the problem of applying LLMs to the specialized Chinese insurance domain, which is incremental as it adapts existing methods to new data.

The authors tackled the underexplored effectiveness of large language models in the Chinese insurance domain by introducing InsQABench, a benchmark dataset with three categories, and proposed SQL-ReAct and RAG-ReAct methods, showing that fine-tuning on the dataset significantly improves performance.

The application of large language models (LLMs) has achieved remarkable success in various fields, but their effectiveness in specialized domains like the Chinese insurance industry remains underexplored. The complexity of insurance knowledge, encompassing specialized terminology and diverse data types, poses significant challenges for both models and users. To address this, we introduce InsQABench, a benchmark dataset for the Chinese insurance sector, structured into three categories: Insurance Commonsense Knowledge, Insurance Structured Database, and Insurance Unstructured Documents, reflecting real-world insurance question-answering tasks.We also propose two methods, SQL-ReAct and RAG-ReAct, to tackle challenges in structured and unstructured data tasks. Evaluations show that while LLMs struggle with domain-specific terminology and nuanced clause texts, fine-tuning on InsQABench significantly improves performance. Our benchmark establishes a solid foundation for advancing LLM applications in the insurance domain, with data and code available at https://github.com/HaileyFamo/InsQABench.git.

View on arXiv PDF Code

Similar