CLAIJul 19, 2025

ElectriQ: A Benchmark for Assessing the Response Capability of Large Language Models in Power Marketing

arXiv:2507.22911v1h-index: 6
Originality Incremental advance
AI Analysis

This work addresses inefficiencies in power marketing services, such as China's 95598 hotline, by providing a domain-specific benchmark, but it is incremental as it builds on existing LLM capabilities.

The paper tackles the problem of slow and inaccurate responses in electric power marketing customer service by introducing ElectriQ, a benchmark that evaluates and enhances large language models (LLMs) for this domain, showing that fine-tuned smaller models like LLama3-8B can surpass GPT-4o in professionalism and user-friendliness.

Electric power marketing customer service plays a critical role in addressing inquiries, complaints, and service requests. However, current systems, such as China's 95598 hotline, often struggle with slow response times, inflexible procedures, and limited accuracy in domain-specific tasks. While large language models (LLMs) like GPT-4o and Claude 3 demonstrate strong general capabilities, they lack the domain expertise and empathy required in this field. To bridge this gap, we introduce ElectriQ, the first benchmark designed to evaluate and enhance LLMs in electric power marketing scenarios. ElectriQ consists of a dialogue dataset covering six key service categories and introduces four evaluation metrics: professionalism, popularity, readability, and user-friendliness. We further incorporate a domain-specific knowledge base and propose a knowledge augmentation method to boost model performance. Experiments on 13 LLMs reveal that smaller models such as LLama3-8B, when fine-tuned and augmented, can surpass GPT-4o in terms of professionalism and user-friendliness. ElectriQ establishes a comprehensive foundation for developing LLMs tailored to the needs of power marketing services.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes