Willem-Jan van den Heuvel

SE
h-index13
14papers
107citations
Novelty33%
AI Score47

14 Papers

77.3SEJun 3
A Taxonomy of Runtime Faults in Model Context Protocol Servers

Joshua Owotogbe, Indika Kumara, Willem-Jan van den Heuvel et al.

MCP (Model Context Protocol) enables LLMs (Large Language Models) to interact with external tools and data sources via a standardized protocol. Its rapid adoption in tool-augmented Artificial Intelligence (AI) workflows has introduced new reliability challenges, such as configuration parameters that are accepted but not enforced at runtime, leading to unintended default behavior, whose runtime fault characteristics remain empirically unexamined. We present the first empirical taxonomy of runtime faults in MCP servers. We manually analyzed 837 MCP-specific runtime fault threads from 473 actively maintained MCP server GitHub repositories and derived a taxonomy using a bottom-up open coding procedure. The taxonomy comprises 11 top-level categories and 27 subcategories (73 leaf fault types), covering recurrent failures across protocol interactions, tool invocations, schema enforcement, state management, model-provider integration, security validation, and timeouts or explicit cancellations of in-progress operations. To assess the taxonomy's external validity, we surveyed 55 MCP server developers. Respondents reported experiencing an average of 20 of the 27 fault subcategories, and no category remained unobserved. These results indicate that the taxonomy reflects widely observed runtime failures in MCP-based systems and shall assist AI software maintenance and evolution in the future.

CRApr 1, 2022
Internet-of-Things Architectures for Secure Cyber-Physical Spaces: the VISOR Experience Report

Daniel De Pascale, Giuseppe Cascavilla, Mirella Sangiovanni et al.

Internet of things (IoT) technologies are becoming a more and more widespread part of civilian life in common urban spaces, which are rapidly turning into cyber-physical spaces. Simultaneously, the fear of terrorism and crime in such public spaces is ever-increasing. Due to the resulting increased demand for security, video-based IoT surveillance systems have become an important area for research. Considering the large number of devices involved in the illicit recognition task, we conducted a field study in a Dutch Easter music festival in a national interest project called VISOR to select the most appropriate device configuration in terms of performance and results. We iteratively architected solutions for the security of cyber-physical spaces using IoT devices. We tested the performance of multiple federated devices encompassing drones, closed-circuit television, smart phone cameras, and smart glasses to detect real-case scenarios of potentially malicious activities such as mosh-pits and pick-pocketing. Our results pave the way to select optimal IoT architecture configurations -- i.e., a mix of CCTV, drones, smart glasses, and camera phones in our case -- to make safer cyber-physical spaces' a reality.

26.3SEMay 8
"Show Me You Comply... Without Showing Me Anything": Zero-Knowledge Software Auditing for AI-Enabled Systems

Filippo Scaramuzza, Renato Cordeiro Ferreira, Giovanni Quattrocchi et al.

Classical software verification and validation techniques, such as procedural audits, formal methods, or model documentation, are the traditional mechanisms used to achieve the verifiable accountability now required by regulations like the EU AI Act. These methods are either expensive or heavily manual, and ill-suited for the opaque, "black box" nature of most Artificial Intelligence (AI) models. A conflict arises: high auditability and verifiability are required by law, but such transparency conflicts with the need to protect the assets being audited (e.g., confidential data and proprietary models). This paper introduces ZKMLOps, an \ac{MLOps} verification framework that operationalizes Zero-Knowledge Proofs (ZKPs) within Machine-Learning Operations lifecycles; a ZKP allows a prover to convince a verifier that a statement is true without revealing any information about the statement itself. By integrating ZKP with established software engineering patterns, ZKMLOps provides a modular and repeatable process for generating verifiable cryptographic evidence-proofs of well-defined computational statements about the audited model and its inputs-that auditors can use as input to a regulatory compliance determination. We evaluate the framework along two dimensions. First, framework viability: orchestration overhead is bounded and stable across architecturally heterogeneous ZKP backends and models of increasing size. Second, cost-versus-assurance trade-offs: the audit-on-demand setting is the regime in which full zero-knowledge auditing is the appropriate tool, where it provides confidentiality and integrity guarantees that lighter-weight alternatives cannot match.

AIDec 16, 2025
IaC Generation with LLMs: An Error Taxonomy and A Study on Configuration Knowledge Injection

Roman Nekrasov, Stefano Fossati, Indika Kumara et al.

Large Language Models (LLMs) currently exhibit low success rates in generating correct and intent-aligned Infrastructure as Code (IaC). This research investigated methods to improve LLM-based IaC generation, specifically for Terraform, by systematically injecting structured configuration knowledge. To facilitate this, an existing IaC-Eval benchmark was significantly enhanced with cloud emulation and automated error analysis. Additionally, a novel error taxonomy for LLM-assisted IaC code generation was developed. A series of knowledge injection techniques was implemented and evaluated, progressing from Naive Retrieval-Augmented Generation (RAG) to more sophisticated Graph RAG approaches. These included semantic enrichment of graph components and modeling inter-resource dependencies. Experimental results demonstrated that while baseline LLM performance was poor (27.1% overall success), injecting structured configuration knowledge increased technical validation success to 75.3% and overall success to 62.6%. Despite these gains in technical correctness, intent alignment plateaued, revealing a "Correctness-Congruence Gap" where LLMs can become proficient "coders" but remain limited "architects" in fulfilling nuanced user intent.

NEJul 12, 2024
A Scale-Invariant Diagnostic Approach Towards Understanding Dynamics of Deep Neural Networks

Ambarish Moharil, Damian Tamburri, Indika Kumara et al.

This paper introduces a scale-invariant methodology employing \textit{Fractal Geometry} to analyze and explain the nonlinear dynamics of complex connectionist systems. By leveraging architectural self-similarity in Deep Neural Networks (DNNs), we quantify fractal dimensions and \textit{roughness} to deeply understand their dynamics and enhance the quality of \textit{intrinsic} explanations. Our approach integrates principles from Chaos Theory to improve visualizations of fractal evolution and utilizes a Graph-Based Neural Network for reconstructing network topology. This strategy aims at advancing the \textit{intrinsic} explainability of connectionist Artificial Intelligence (AI) systems.

SEDec 9, 2025
Reusability in MLOps: Leveraging Ports and Adapters to Build a Microservices Architecture for the Maritime Domain

Renato Cordeiro Ferreira, Aditya Dhinavahi, Rowanne Trapmann et al.

ML-Enabled Systems (MLES) are inherently complex since they require multiple components to achieve their business goal. This experience report showcases the software architecture reusability techniques applied while building Ocean Guard, an MLES for anomaly detection in the maritime domain. In particular, it highlights the challenges and lessons learned to reuse the Ports and Adapters pattern to support building multiple microservices from a single codebase. This experience report hopes to inspire software engineers, machine learning engineers, and data scientists to apply the Hexagonal Architecture pattern to build their MLES.

SESep 22, 2020Code
DeepIaC: Deep Learning-Based Linguistic Anti-pattern Detection in IaC

Nemania Borovits, Indika Kumara, Parvathy Krishnan et al.

Linguistic anti-patterns are recurring poor practices concerning inconsistencies among the naming, documentation, and implementation of an entity. They impede readability, understandability, and maintainability of source code. This paper attempts to detect linguistic anti-patterns in infrastructure as code (IaC) scripts used to provision and manage computing environments. In particular, we consider inconsistencies between the logic/body of IaC code units and their names. To this end, we propose a novel automated approach that employs word embeddings and deep learning techniques. We build and use the abstract syntax tree of IaC code units to create their code embedments. Our experiments with a dataset systematically extracted from open source repositories show that our approach yields an accuracy between0.785and0.915in detecting inconsistencies

SEMay 4, 2021
QSOC: Quantum Service-Oriented Computing

Indika Kumara, Willem-Jan Van Den Heuvel, Damian A. Tamburri

Quantum computing is quickly turning from a promise to a reality, witnessing the launch of several cloud-based, general-purpose offerings, and IDEs. Unfortunately, however, existing solutions typically implicitly assume intimate knowledge about quantum computing concepts and operators. This paper introduces Quantum Service-Oriented Computing (QSOC), including a model-driven methodology to allow enterprise DevOps teams to compose, configure and operate enterprise applications without intimate knowledge on the underlying quantum infrastructure, advocating knowledge reuse, separation of concerns, resource optimization, and mixed quantum- & conventional QSOC applications.

AIApr 5, 2021
DataOps for Societal Intelligence: a Data Pipeline for Labor Market Skills Extraction and Matching

Damian Andrew Tamburri, Willem-Jan Van den Heuvel, Martin Garriga

Big Data analytics supported by AI algorithms can support skills localization and retrieval in the context of a labor market intelligence problem. We formulate and solve this problem through specific DataOps models, blending data sources from administrative and technical partners in several countries into cooperation, creating shared knowledge to support policy and decision-making. We then focus on the critical task of skills extraction from resumes and vacancies featuring state-of-the-art machine learning models. We showcase preliminary results with applied machine learning on real data from the employment agencies of the Netherlands and the Flemish region in Belgium. The final goal is to match these skills to standard ontologies of skills, jobs and occupations.

SEFeb 17, 2021
Automated Test-Case Generation for Solidity Smart Contracts: the AGSolT Approach and its Evaluation

Stefan Driessen, Dario Di Nucci, Geert Monsieur et al.

Blockchain and smart contract technology are novel approaches to data and code management that facilitate trusted computing by allowing for development in a distributed and decentralized manner. Testing smart contracts comes with its own set of challenges which have not yet been fully identified and explored. Although existing tools can identify and discover known vulnerabilities and their interactions on the Ethereum blockchain through random search or symbolic execution, these tools generally do not produce test suites suitable for human oracles. In this paper, we present AGSOLT (Automated Generator of Solidity Test Suites). We demonstrate its efficiency by implementing two search algorithms to automatically generate test suites for stand-alone Solidity smart contracts, taking into account some of the blockchain-specific challenges. To test AGSOLT, we compared a random search algorithm and a genetic algorithm on a set of 36 real-world smart contracts. We found that AGSOLT is capable of achieving high branch coverage with both approaches and even discovered some errors in some of the most popular Solidity smart contracts on Github.

SEJul 4, 2020
Towards Semantic Detection of Smells in Cloud Infrastructure Code

Indika Kumara, Zoe Vasileiou, Georgios Meditskos et al.

Automated deployment and management of Cloud applications relies on descriptions of their deployment topologies, often referred to as Infrastructure Code. As the complexity of applications and their deployment models increases, developers inadvertently introduce software smells to such code specifications, for instance, violations of good coding practices, modular structure, and more. This paper presents a knowledge-driven approach enabling developers to identify the aforementioned smells in deployment descriptions. We detect smells with SPARQL-based rules over pattern-based OWL 2 knowledge graphs capturing deployment models. We show the feasibility of our approach with a prototype and three case studies.

SEMar 25, 2020
Quality Assurance of Heterogeneous Applications: The SODALITE Approach

Indika Kumara, Giovanni Quattrocchi, Damian Tamburri et al.

A key focus of the SODALITE project is to assure the quality and performance of the deployments of applications over heterogeneous Cloud and HPC environments. It offers a set of tools to detect and correct errors, smells, and bugs in the deployment models and their provisioning workflows, and a framework to monitor and refactor deployment model instances at runtime. This paper presents objectives, designs, early results of the quality assurance framework and the refactoring framework.

SEFeb 10, 2020
FM4SN: A Feature-Oriented Approach to Tenant-Driven Customization of Multi-Tenant Service Networks

Indika Kumara, Jun Han, Alan Colman et al.

In a multi-tenant service network, multiple virtual service networks (VSNs), one for each tenant, coexist on the same service network. The tenants themselves need to be able to dynamically create and customize their own VSNs to support their initial and changing functional and performance requirements. These tasks are problematic for them due to: 1) platform-specific knowledge required, 2) the existence of a large number of customization options and their dependencies, and 3) the complexity in deriving the right subset of options. In this paper, we present an approach to enable and simplify the tenant-driven customization of multi-tenant service networks. We propose to use feature as a high-level customization abstraction. A regulated collaboration among a set of services in the service network realizes a feature. A software engineer can design a customization policy for a service network using the mappings between features and collaborations, and enact the policy with the controller of the service network. A tenant can then specify the requirements for its VSN as a set of functional and performance features. A customization request from a tenant triggers the customization policy of the service network, which (re)configures the corresponding VSN at runtime to realize the selected features. We show the feasibility of our approach with two case studies and a performance evaluation.

CRNov 26, 2019
Blockchains: a Systematic Multivocal Literature Review

Bert-Jan Butijn, Damian A. Tamburri, Willem-Jan Van Den Heuvel

Blockchain technology has gained tremendous popularity both in practice and academia. The goal of this article is to develop a coherent overview of the state of the art in blockchain technology, using a systematic(i.e.,protocol-based, replicable), multivocal (i.e., featuring both white and grey literature alike) literature review, to (1) define blockchain technology (2) elaborate on its architecture options and (3) trade-offs, as well as understanding (4) the current applications and challenges, as evident from the state of the art. We derive a systematic definition of blockchain technology, based on a formal concept analysis. Further on, we flesh out an overview of blockchain technology elaborated by means of Grounded-Theory.