SEIRNov 28, 2021

Semantic Code Search for Smart Contracts

arXiv:2111.14139v1
Originality Incremental advance
AI Analysis

This addresses the need for efficient code search tools for smart contract developers, given the high code reuse rate, but it is incremental as it builds on existing methods with domain-specific improvements.

The paper tackles the problem of semantic code search for smart contracts, where existing models have a semantic gap and perform poorly on specialized queries, by proposing a Multi-Modal Smart contract Code Search (MM-SCS) model that achieves an MRR of 0.572, outperforming state-of-the-art models by up to 59.3%.

Semantic code search technology allows searching for existing code snippets through natural language, which can greatly improve programming efficiency. Smart contracts, programs that run on the blockchain, have a code reuse rate of more than 90%, which means developers have a great demand for semantic code search tools. However, the existing code search models still have a semantic gap between code and query, and perform poorly on specialized queries of smart contracts. In this paper, we propose a Multi-Modal Smart contract Code Search (MM-SCS) model. Specifically, we construct a Contract Elements Dependency Graph (CEDG) for MM-SCS as an additional modality to capture the data-flow and control-flow information of the code. To make the model more focused on the key contextual information, we use a multi-head attention network to generate embeddings for code features. In addition, we use a fine-tuned pretrained model to ensure the model's effectiveness when the training data is small. We compared MM-SCS with four state-of-the-art models on a dataset with 470K (code, docstring) pairs collected from Github and Etherscan. Experimental results show that MM-SCS achieves an MRR (Mean Reciprocal Rank) of 0.572, outperforming four state-of-the-art models UNIF, DeepCS, CARLCS-CNN, and TAB-CS by 34.2%, 59.3%, 36.8%, and 14.1%, respectively. Additionally, the search speed of MM-SCS is second only to UNIF, reaching 0.34s/query.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes