CLMay 8, 2025Code
KG-HTC: Integrating Knowledge Graphs into LLMs for Effective Zero-shot Hierarchical Text ClassificationQianbo Zang, Christophe Zgrzendek, Igor Tchappi et al.
Hierarchical Text Classification (HTC) involves assigning documents to labels organized within a taxonomy. Most previous research on HTC has focused on supervised methods. However, in real-world scenarios, employing supervised HTC can be challenging due to a lack of annotated data. Moreover, HTC often faces issues with large label spaces and long-tail distributions. In this work, we present Knowledge Graphs for zero-shot Hierarchical Text Classification (KG-HTC), which aims to address these challenges of HTC in applications by integrating knowledge graphs with Large Language Models (LLMs) to provide structured semantic context during classification. Our method retrieves relevant subgraphs from knowledge graphs related to the input text using a Retrieval-Augmented Generation (RAG) approach. Our KG-HTC can enhance LLMs to understand label semantics at various hierarchy levels. We evaluate KG-HTC on three open-source HTC datasets: WoS, DBpedia, and Amazon. Our experimental results show that KG-HTC significantly outperforms three baselines in the strict zero-shot setting, particularly achieving substantial improvements at deeper levels of the hierarchy. This evaluation demonstrates the effectiveness of incorporating structured knowledge into LLMs to address HTC's challenges in large label spaces and long-tailed label distributions. Our code is available at: https://github.com/QianboZang/KG-HTC.
CRNov 11, 2021
Fairness, Integrity, and Privacy in a Scalable Blockchain-based Federated Learning SystemTimon Rückel, Johannes Sedlmeir, Peter Hofmann
Federated machine learning (FL) allows to collectively train models on sensitive data as only the clients' models and not their training data need to be shared. However, despite the attention that research on FL has drawn, the concept still lacks broad adoption in practice. One of the key reasons is the great challenge to implement FL systems that simultaneously achieve fairness, integrity, and privacy preservation for all participating clients. To contribute to solving this issue, our paper suggests a FL system that incorporates blockchain technology, local differential privacy, and zero-knowledge proofs. Our implementation of a proof-of-concept with multiple linear regression illustrates that these state-of-the-art technologies can be combined to a FL system that aligns economic incentives, trust, and confidentiality requirements in a scalable and transparent system.
CYNov 11, 2021
Designing a Framework for Digital KYC Processes Built on Blockchain-Based Self-Sovereign IdentityVincent Schlatt, Johannes Sedlmeir, Simon Feulner et al.
Know your customer (KYC) processes place a great burden on banks, because they are costly, inefficient, and inconvenient for customers. While blockchain technology is often mentioned as a potential solution, it is not clear how to use the technology's advantages without violating data protection regulations and customer privacy. We demonstrate how blockchain-based self-sovereign identity (SSI) can solve the challenges of KYC. We follow a rigorous design science research approach to create a framework that utilizes SSI in the KYC process, deriving nascent design principles that theorize on blockchain's role for SSI.
SEOct 10, 2021
A Serverless Distributed Ledger for EnterprisesJohannes Sedlmeir, Tim Wagner, Emil Djerekarov et al.
Enterprises have been attracted by the capability of blockchains to provide a single source of truth for workloads that span companies, geographies, and clouds while retaining the independence of each party's IT operations. However, so far production applications have remained rare, stymied by technical limitations of existing blockchain technologies and challenges with their integration into enterprises' IT systems. In this paper, we collect enterprises' requirements on distributed ledgers for data sharing and integration from a technical perspective, argue that they are not sufficiently addressed by available blockchain frameworks, and propose a novel distributed ledger design that is "serverless", i.e., built on cloud-native resources. We evaluate its qualitative and quantitative properties and give evidence that enterprises already heavily reliant on cloud service providers would consider such an approach acceptable, particularly if it offers ease of deployment, low transactional cost structure, and a combination of latency and scalability aligned with real-time IT application needs.
CRSep 12, 2021
Harmonizing sensitive data exchange and double-spending prevention through blockchain and digital wallets: The case of e-prescription managementVincent Schlatt, Johannes Sedlmeir, Janina Traue et al.
The digital transformation of the medical sector requires solutions that are convenient and efficient for all stakeholders while protecting patients' sensitive data. One example that has already attracted design-oriented research are medical prescriptions. However, current implementations of electronic prescription management systems typically create centralized data silos, leaving user data vulnerable to cybersecurity incidents and impeding interoperability. Research has also proposed decentralized solutions based on blockchain technology, but privacy-related challenges have often been ignored. We conduct design science research to develop and implement a system for the exchange of electronic prescriptions that builds on two blockchains and a digital wallet app. Our solution combines the bilateral, verifiable, and privacy-focused exchange of information between doctors, patients, and pharmacies through verifiable credentials with a token-based, anonymized double-spending check. Our qualitative and quantitative evaluations as well as a security analysis suggest that this architecture can improve existing approaches to electronic prescription management by offering patients control over their data by design, a high level of security, sufficient performance and scalability, and interoperability with emerging digital identity management solutions for users, businesses, and institutions. We also derive principles on how to design decentralized, privacy-oriented information systems that require both the exchange of sensitive information and double-usage protection.
CRJul 25, 2021
Revealing the Landscape of Privacy-Enhancing Technologies in the Context of Data Markets for the IoT: A Systematic Literature ReviewGonzalo Munilla Garrido, Johannes Sedlmeir, Ömer Uludağ et al.
IoT data markets in public and private institutions have become increasingly relevant in recent years because of their potential to improve data availability and unlock new business models. However, exchanging data in markets bears considerable challenges related to disclosing sensitive information. Despite considerable research focused on different aspects of privacy-enhancing data markets for the IoT, none of the solutions proposed so far seems to find a practical adoption. Thus, this study aims to organize the state-of-the-art solutions, analyze and scope the technologies that have been suggested in this context, and structure the remaining challenges to determine areas where future research is required. To accomplish this goal, we conducted a systematic literature review on privacy enhancement in data markets for the IoT, covering 50 publications dated up to July 2020, and provided updates with 24 publications dated up to May 2022. Our results indicate that most research in this area has emerged only recently, and no IoT data market architecture has established itself as canonical. Existing solutions frequently lack the required combination of anonymization and secure computation technologies. Furthermore, there is no consensus on the appropriate use of blockchain technology for IoT data markets and a low degree of leveraging existing libraries or reusing generic data market architectures. We also identified significant challenges remaining, such as the copy problem and the recursive enforcement problem that-while solutions have been suggested to some extent-are often not sufficiently addressed in proposed designs. We conclude that privacy-enhancing technologies need further improvements to positively impact data markets so that, ultimately, the value of data is preserved through data scarcity and users' privacy and businesses-critical information are protected.
CRFeb 15, 2021
Recent Developments in Blockchain Technology and their Impact on Energy ConsumptionJohannes Sedlmeir, Hans Ulrich Buhl, Gilbert Fridgen et al.
The enormous power consumption of Bitcoin has led to undifferentiated discussions in science and practice about the sustainability of blockchain and distributed ledger technology in general. However, blockchain technology is far from homogeneous - not only with regard to its applications, which now go far beyond cryptocurrencies and have reached businesses and the public sector, but also with regard to its technical characteristics and, in particular, its power consumption. This paper summarizes the status quo of the power consumption of various implementations of blockchain technology, with special emphasis on the recent 'Bitcoin Halving' and so-called 'zk-rollups'. We argue that although Bitcoin and other proof-of-work blockchains do indeed consume a lot of power, alternative blockchain solutions with significantly lower power consumption are already available today, and new promising concepts are being tested that could further reduce in particular the power consumption of large blockchain networks in the near future. From this we conclude that although the criticism of Bitcoin's power consumption is legitimate, it should not be used to derive an energy problem of blockchain technology in general. In many cases in which processes can be digitised or improved with the help of more energy-efficient blockchain variants, one can even expect net energy savings.
PFFeb 15, 2021
An In-Depth Investigation of the Performance Characteristics of Hyperledger FabricTobias Guggenberger, Johannes Sedlmeir, Gilbert Fridgen et al.
Private permissioned blockchains are deployed in ever greater numbers to facilitate cross-organizational processes in various industries, particularly in supply chain management. One popular example of this trend is Hyperledger Fabric. Compared to public permissionless blockchains, it promises improved performance and provides certain features that address key requirements of enterprises. However, also permissioned blockchains are still not as scalable as centralized systems, and due to the scarcity of theoretical results and empirical data, their real-world performance cannot be predicted with the necessary precision. We intend to address this issue by conducting an in-depth performance analysis of Hyperledger Fabric. The paper presents a detailed compilation of various performance characteristics using an enhanced version of the Distributed Ledger Performance Scan (DLPS). Researchers and practitioners alike can use the various performance properties identified and discussed as guidelines to better configure and implement their Hyperledger Fabric network. Likewise, they are encouraged to use the DLPS framework to conduct their measurements.