Rongcun Wang

6.3SEJun 18

Repository-Level Solidity Code Generation with Large Language Models: From Prompting to Fine-Tuning

Shi Chen, Rongcun Wang, Yuan Tian et al.

Large Language Models (LLMs) have shown strong capabilities in general-purpose code generation, but their effectiveness in specialized software domains remains underexplored. Solidity smart contracts represent a high-stakes domain where generated code must satisfy strict language-level, security, and software-engineering constraints. Existing benchmarks and metrics remain insufficient for repository-level Solidity generation, where models must synthesize complete contracts from natural language requirements. To address this gap, we introduce SolidityBench, a benchmark of 5,470 repository-level Solidity smart contracts paired with natural language descriptions. We also propose SolidityScore, a Solidity-aware semantic metric that emphasizes domain-critical constructs such as security modifiers, contract declarations, and Solidity-specific keywords. Using this benchmark, we evaluate representative code LLMs, including Qwen2.5-Coder, DeepSeek-Coder, and CodeLlama, across zero-shot prompting, Chain-of-Thought reasoning, in-context learning, retrieval-augmented generation, and supervised fine-tuning. The results show that general-purpose models exhibit systematic structural deficiencies in repository-level Solidity generation. Among non-parametric methods, retrieval-augmented generation performs best, while in-context learning degrades beyond two examples due to context saturation. Supervised fine-tuning achieves the largest improvement by internalizing Solidity-specific constraints into model parameters. Overall, our study provides a comprehensive benchmark for repository-level Solidity code generation and shows that high-quality domain data combined with supervised fine-tuning is the most effective strategy for improving the reliability of LLM-generated smart contracts.

6.4SEMar 17, 2021

An Integration Test Order Strategy to Consider Control Coupling

Shujuan Jiang, Miao Zhang, Yanmei Zhang et al.

Integration testing is a very important step in software testing. Existing methods evaluate the stubbing cost for class integration test orders by considering only the interclass direct relationships such as inheritance, aggregation, and association, but they omit the interclass indirect relationship caused by control coupling, which can also affect the test orders and the stubbing cost. In this paper, we introduce an integration test order strategy to consider control coupling. We advance the concept of transitive relationship to describe this kind of interclass dependency and propose a new measurement method to estimate the complexity of control coupling, which is the complexity of stubs created for a transitive relationship. We evaluate our integration test order strategy on 10 programs on various scales. The results show that considering the transitive relationship when generating class integration test orders can significantly reduce the stubbing cost for most programs and that our integration test order strategy obtains satisfactory results more quickly than other methods.

Rongcun Wang

2 Papers