DISL: Fueling Research with A Large Dataset of Solidity Smart Contracts
This provides a resource for developing machine learning systems and benchmarking tools in the domain of smart contracts, though it is incremental as it builds on existing data aggregation efforts.
The authors tackled the lack of a large, diverse dataset for smart contract research by creating DISL, a collection of 514,506 unique Solidity files from Ethereum, which surpasses existing datasets in size and recency.
The DISL dataset features a collection of $514,506$ unique Solidity files that have been deployed to Ethereum mainnet. It caters to the need for a large and diverse dataset of real-world smart contracts. DISL serves as a resource for developing machine learning systems and for benchmarking software engineering tools designed for smart contracts. By aggregating every verified smart contract from Etherscan up to January 15, 2024, DISL surpasses existing datasets in size and recency.