IRAug 23, 2023
Evolution of ESG-focused DLT Research: An NLP Analysis of the LiteratureWalter Hernandez Cruz, Kamil Tylinski, Alastair Moore et al.
Distributed Ledger Technology (DLT) faces increasing environmental scrutiny, particularly concerning the energy consumption of the Proof of Work (PoW) consensus mechanism and broader Environmental, Social, and Governance (ESG) issues. However, existing systematic literature reviews of DLT rely on limited analyses of citations, abstracts, and keywords, failing to fully capture the field's complexity and ESG concerns. We address these challenges by analyzing the full text of 24,539 publications using Natural Language Processing (NLP) with our manually labeled Named Entity Recognition (NER) dataset of 39,427 entities for DLT. This methodology identified 505 key publications at the DLT/ESG intersection, enabling comprehensive domain analysis. Our combined NLP and temporal graph analysis reveals critical trends in DLT evolution and ESG impacts, including cryptography and peer-to-peer networks research's foundational influence, Bitcoin's persistent impact on research and environmental concerns (a "Lindy effect"), Ethereum's catalytic role on Proof of Stake (PoS) and smart contract adoption, and the industry's progressive shift toward energy-efficient consensus mechanisms. Our contributions include the first DLT-specific NER dataset addressing the scarcity of high-quality labeled NLP data in blockchain research, a methodology integrating NLP and temporal graph analysis for large-scale interdisciplinary literature reviews, and the first NLP-driven literature review focusing on DLT's ESG aspects.
70.3SOC-PHMay 26
A Network Inefficiency Metric for Structural Stress Detection in Hedera TransactionsDeep Nath, Paolo Tasca, Nikhil Vadgama et al.
Quantifying structural stress in transaction networks requires metrics that capture structural organization beyond transaction volume alone. In this work, we introduce the Inefficiency Metric, a deterministic indicator designed to characterize the routing structure of capital flows in decentralized systems. Using Principal Component Analysis and Pearson correlation matrices computed from a six-year Hedera transaction dataset, we identify two dominant and largely independent structural dimensions: the effective diameter, related to the spatial extension of transaction propagation, and the closeness centrality, associated with the efficiency of network-level flow processing. The proposed metric reveals significant topological fluctuations associated with major macroeconomic and ecosystem-level events. Increased inefficiency is observed during periods marked by intermediary fragmentation or rapid smart-contract expansion, whereas lower inefficiency corresponds to phases of network compaction during market stress or institutional concentration. Comparison with a seven-dimensional Isolation Forest approach shows that the metric effectively captures severe multidimensional anomalies while preserving a clear structural interpretation. Overall, these results provide a physics-inspired framework for relating the large-scale organization of decentralized transaction networks to observable economic dynamics.
CLFeb 25
DLT-Corpus: A Large-Scale Text Collection for the Distributed Ledger Technology DomainWalter Hernandez Cruz, Peter Devine, Nikhil Vadgama et al.
We introduce DLT-Corpus, the largest domain-specific text collection for Distributed Ledger Technology (DLT) research to date: 2.98 billion tokens from 22.12 million documents spanning scientific literature (37,440 publications), United States Patent and Trademark Office (USPTO) patents (49,023 filings), and social media (22 million posts). Existing Natural Language Processing (NLP) resources for DLT focus narrowly on cryptocurrencies price prediction and smart contracts, leaving domain-specific language under explored despite the sector's ~$3 trillion market capitalization and rapid technological evolution. We demonstrate DLT-Corpus' utility by analyzing technology emergence patterns and market-innovation correlations. Findings reveal that technologies originate in scientific literature before reaching patents and social media, following traditional technology transfer patterns. While social media sentiment remains overwhelmingly bullish even during crypto winters, scientific and patent activity grow independently of market fluctuations, tracking overall market expansion in a virtuous cycle where research precedes and enables economic growth that funds further innovation. We publicly release the full DLT-Corpus; LedgerBERT, a domain-adapted model achieving 23% improvement over BERT-base on a DLT-specific Named Entity Recognition (NER) task; and all associated tools and code.
28.5CRMay 11
SoK: A Systematic Bidirectional Literature Review of AI & DLT ConvergenceAli Irzam Kathia, Yimika Erinle, Abylay Satybaldy et al.
The integration of Artificial Intelligence (AI) with Distributed Ledger Technology (DLT) has become a growing research area, yet contributions tend to cluster around specific application domains or examine only one direction of the integration, leaving the broader architectural interplay between the two technologies poorly understood. This work addresses that gap through a structured, bidirectional review of peer-reviewed studies published between 2020 and 2025. We classify contributions along two directions: AI-enhanced DLT, and DLT-enhanced AI. In the first case, we examine how AI techniques improve DLT systems across five layers: data, network, consensus, execution, and application layers. In the second case, we analyse how DLT supports AI systems across five layers: infrastructure, data, model, inference, and application layers, with particular attention to federated learning, model evaluation, and multi-agent coordination. The analysis reveals that most works concentrate on a small subset of layers: execution and consensus for AI-enhanced DLT, data and model for DLT-enhanced AI. Other layers remain comparatively neglected. Despite reported improvements in controlled settings, no study demonstrates deployment at production scale, and the field has not yet offered satisfying answers to fundamental questions around scalability, interoperability, and verifiable execution. We argue that progress will require cross-layer co-design and empirical validation in real-world settings.