Full Bitcoin Blockchain Data Made Easy
This work provides a practical solution for researchers and analysts needing accessible Bitcoin blockchain data, though it is incremental as it builds on existing tools and methods.
The paper tackles the challenge of collecting and processing the full Bitcoin blockchain data, which is difficult due to its size and complexity, by presenting a lossless, reproducible procedure that includes additional indexing for easier data handling and selection, as demonstrated in large-scale use cases like address clustering.
Despite the fact that it is publicly available, collecting and processing the full bitcoin blockchain data is not trivial. Its mere size, history, and other features indeed raise quite specific challenges, that we address in this paper. The strengths of our approach are the following: it relies on very basic and standard tools, which makes the procedure reliable and easily reproducible; it is a purely lossless procedure ensuring that we catch and preserve all existing data; it provides additional indexing that makes it easy to further process the whole data and select appropriate subsets of it. We present our procedure in details and illustrate its added value on large-scale use cases, like address clustering. We provide an implementation online, as well as the obtained dataset.