AIJul 29, 2025

GovRelBench:A Benchmark for Government Domain Relevance

arXiv:2507.21419v1Has Code
Originality Synthesis-oriented
AI Analysis

This provides a domain-specific evaluation tool for government-related LLM research, but it is incremental as it builds on existing benchmark and model architectures.

The authors tackled the lack of evaluation for LLMs' core capabilities in the government domain by proposing GovRelBench, a benchmark with domain prompts and GovRelBERT, a tool using SoftGovScore to compute relevance scores, resulting in an available dataset and code.

Current evaluations of LLMs in the government domain primarily focus on safety considerations in specific scenarios, while the assessment of the models' own core capabilities, particularly domain relevance, remains insufficient. To address this gap, we propose GovRelBench, a benchmark specifically designed for evaluating the core capabilities of LLMs in the government domain. GovRelBench consists of government domain prompts and a dedicated evaluation tool, GovRelBERT. During the training process of GovRelBERT, we introduce the SoftGovScore method: this method trains a model based on the ModernBERT architecture by converting hard labels to soft scores, enabling it to accurately compute the text's government domain relevance score. This work aims to enhance the capability evaluation framework for large models in the government domain, providing an effective tool for relevant research and practice. Our code and dataset are available at https://github.com/pan-xi/GovRelBench.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes