AIJul 28, 2025

evalSmarT: An LLM-Based Framework for Evaluating Smart Contract Generated Comments

arXiv:2507.20774v1h-index: 1ASE
Originality Synthesis-oriented
AI Analysis

This addresses the challenge of scalable and nuanced evaluation for smart contract comments, which is incremental as it applies existing LLMs to a new domain-specific task.

The paper tackles the problem of evaluating smart contract comment quality by introducing evalSmarT, an LLM-based framework that supports over 400 configurations and shows prompt design significantly impacts alignment with human judgment.

Smart contract comment generation has gained traction as a means to improve code comprehension and maintainability in blockchain systems. However, evaluating the quality of generated comments remains a challenge. Traditional metrics such as BLEU and ROUGE fail to capture domain-specific nuances, while human evaluation is costly and unscalable. In this paper, we present \texttt{evalSmarT}, a modular and extensible framework that leverages large language models (LLMs) as evaluators. The system supports over 400 evaluator configurations by combining approximately 40 LLMs with 10 prompting strategies. We demonstrate its application in benchmarking comment generation tools and selecting the most informative outputs. Our results show that prompt design significantly impacts alignment with human judgment, and that LLM-based evaluation offers a scalable and semantically rich alternative to existing methods.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes