Marco Vieira

CR
9papers
37citations
Novelty27%
AI Score40

9 Papers

SEMay 13Code
LLM-Based Robustness Testing of Microservice Applications: An Empirical Study

Hrushitha Goud Tigulla, Marco Vieira

Malformed, missing, or boundary-value inputs in microservice APIs can cascade across dependent services, threatening reliability. Robustness testing systematically exercises such inputs to expose server-side failures, but generating diverse, effective tests remains challenging. Large Language Models can generate such tests from API specifications; however, it is unknown whether different models and prompt strategies produce diverse failure sets or converge on the same failures. We report a controlled experiment applying 7 prompt strategies to 3 open-source LLMs (14B-70B parameters) targeting 2 architecturally distinct microservice systems: one Java monolingual (6 services, 9 failure modes) and one polyglot (27 services, 14 failure modes), yielding 38 valid runs and 663 generated tests. We find that prompt strategy explains more variation in diversity than model size: a Structured prompt collapses diversity entirely, while a single model varied across three prompt strategies achieves complete failure-mode coverage on one system, outperforming any multi-model ensemble under a fixed prompt. We introduce two strategies, Guided and GuidedFewShot, that embed a mutation taxonomy from prior robustness testing research as domain context. GuidedFewShot achieves the highest single-run coverage on both systems (5 of 9 and 8 of 14 failure modes) while maintaining low cross-model similarity. A key lesson is that taxonomy rules alone are insufficient: LLMs cannot distinguish key-absent from value-empty mutations without concrete examples. Findings replicate across both systems.

SESep 16, 2024
VulnLLMEval: A Framework for Evaluating Large Language Models in Software Vulnerability Detection and Patching

Arastoo Zibaeirad, Marco Vieira

Large Language Models (LLMs) have shown promise in tasks like code translation, prompting interest in their potential for automating software vulnerability detection (SVD) and patching (SVP). To further research in this area, establishing a benchmark is essential for evaluating the strengths and limitations of LLMs in these tasks. Despite their capabilities, questions remain regarding whether LLMs can accurately analyze complex vulnerabilities and generate appropriate patches. This paper introduces VulnLLMEval, a framework designed to assess the performance of LLMs in identifying and patching vulnerabilities in C code. Our study includes 307 real-world vulnerabilities extracted from the Linux kernel, creating a well-curated dataset that includes both vulnerable and patched code. This dataset, based on real-world code, provides a diverse and representative testbed for evaluating LLM performance in SVD and SVP tasks, offering a robust foundation for rigorous assessment. Our results reveal that LLMs often struggle with distinguishing between vulnerable and patched code. Furthermore, in SVP tasks, these models tend to oversimplify the code, producing solutions that may not be directly usable without further refinement.

SEDec 14, 2025Code
Diverse LLMs vs. Vulnerabilities: Who Detects and Fixes Them Better?

Arastoo Zibaeirad, Marco Vieira

Large Language Models (LLMs) are increasingly being studied for Software Vulnerability Detection (SVD) and Repair (SVR). Individual LLMs have demonstrated code understanding abilities, but they frequently struggle when identifying complex vulnerabilities and generating fixes. This study presents DVDR-LLM, an ensemble framework that combines outputs from diverse LLMs to determine whether aggregating multiple models reduces error rates. Our evaluation reveals that DVDR-LLM achieves 10-12% higher detection accuracy compared to the average performance of individual models, with benefits increasing as code complexity grows. For multi-file vulnerabilities, the ensemble approach demonstrates significant improvements in recall (+18%) and F1 score (+11.8%) over individual models. However, the approach raises measurable trade-offs: reducing false positives in verification tasks while simultaneously increasing false negatives in detection tasks, requiring careful decision on the required level of agreement among the LLMs (threshold) for increased performance across different security contexts. Artifact: https://github.com/Erroristotle/DVDR_LLM

CRJun 15, 2020
A Model-Based Approach to Anomaly Detection Trading Detection Time and False Alarm Rate

Charles F. Gonçalves, Daniel S. Menasché, Alberto Avritzer et al.

The complexity and ubiquity of modern computing systems is a fertile ground for anomalies, including security and privacy breaches. In this paper, we propose a new methodology that addresses the practical challenges to implement anomaly detection approaches. Specifically, it is challenging to define normal behavior comprehensively and to acquire data on anomalies in diverse cloud environments. To tackle those challenges, we focus on anomaly detection approaches based on system performance signatures. In particular, performance signatures have the potential of detecting zero-day attacks, as those approaches are based on detecting performance deviations and do not require detailed knowledge of attack history. The proposed methodology leverages an analytical performance model and experimentation and allows to control the rate of false positives in a principled manner. The methodology is evaluated using the TPCx-V workload, which was profiled during a set of executions using resource exhaustion anomalies that emulate the effects of anomalies affecting system performance. The proposed approach was able to successfully detect the anomalies, with a low number of false positives (precision 90%-98%).

CRSep 3, 2019
Towards Models for Availability and Security Evaluation of Cloud Computing with Moving Target Defense

Matheus Torquato, Marco Vieira

Security is one of the most relevant concerns in cloud computing. With the evolution of cyber-security threats, developing innovative techniques to thwart attacks is of utmost importance. One recent method to improve cloud computing security is Moving Target Defense (MTD). MTD makes use of dynamic reconfiguration in virtualized environments to "confuse" attackers or to nullify their knowledge about the system state. However, there is still no consolidated mechanism to evaluate the trade-offs between availability and security when using MTD on cloud computing. The evaluation through measurements is complex as one needs to deal with unexpected events as failures and attacks. To overcome this challenge, we intend to propose a set of models to evaluate the availability and security of MTD in cloud computing environments. The expected results include the quantification of availability and security levels under different conditions (e.g., different software aging rates, varying workloads, different attack intensities).

LGSep 6, 2018
Propheticus: Generalizable Machine Learning Framework

João R. Campos, Marco Vieira, Ernesto Costa

Due to recent technological developments, Machine Learning (ML), a subfield of Artificial Intelligence (AI), has been successfully used to process and extract knowledge from a variety of complex problems. However, a thorough ML approach is complex and highly dependent on the problem at hand. Additionally, implementing the logic required to execute the experiments is no small nor trivial deed, consequentially increasing the probability of faulty code which can compromise the results. Propheticus is a data-driven framework which results of the need for a tool that abstracts some of the inherent complexity of ML, whilst being easy to understand and use, as well as to adapt and expand to assist the user's specific needs. Propheticus systematizes and enforces various complex concepts of an ML experiment workflow, taking into account the nature of both the problem and the data. It contains functionalities to execute all the different tasks, from data preprocessing, to results analysis and comparison. Notwithstanding, it can be fairly easily adapted to different problems due to its flexible architecture, and customized as needed to address the user's needs.

CROct 5, 2014
On Benchmarking Intrusion Detection Systems in Virtualized Environments

Aleksandar Milenkoski, Samuel Kounev, Alberto Avritzer et al.

Modern intrusion detection systems (IDSes) for virtualized environments are deployed in the virtualization layer with components inside the virtual machine monitor (VMM) and the trusted host virtual machine (VM). Such IDSes can monitor at the same time the network and host activities of all guest VMs running on top of a VMM being isolated from malicious users of these VMs. We refer to IDSes for virtualized environments as VMM-based IDSes. In this work, we analyze state-of-the-art intrusion detection techniques applied in virtualized environments and architectures of VMM-based IDSes. Further, we identify challenges that apply specifically to benchmarking VMM-based IDSes focussing on workloads and metrics. For example, we discuss the challenge of defining representative baseline benign workload profiles as well as the challenge of defining malicious workloads containing attacks targeted at the VMM. We also discuss the impact of on-demand resource provisioning features of virtualized environments (e.g., CPU and memory hotplugging, memory ballooning) on IDS benchmarking measures such as capacity and attack detection accuracy. Finally, we outline future research directions in the area of benchmarking VMM-based IDSes and of intrusion detection in virtualized environments in general.

CROct 5, 2014
Technical Information on Vulnerabilities of Hypercall Handlers

Aleksandar Milenkoski, Marco Vieira, Bryan D. Payne et al.

Modern virtualized service infrastructures expose attack vectors that enable attacks of high severity, such as attacks targeting hypervisors. A malicious user of a guest VM (virtual machine) may execute an attack against the underlying hypervisor via hypercalls, which are software traps from a kernel of a fully or partially paravirtualized guest VM to the hypervisor. The exploitation of a vulnerability of a hypercall handler may have severe consequences such as altering hypervisor's memory, which may result in the execution of malicious code with hypervisor privilege. Despite the importance of vulnerabilities of hypercall handlers, there is not much publicly available information on them. This significantly hinders advances towards securing hypercall interfaces. In this work, we provide in-depth technical information on publicly disclosed vulnerabilities of hypercall handlers. Our vulnerability analysis is based on reverse engineering the released patches fixing the considered vulnerabilities. For each analyzed vulnerability, we provide background information essential for understanding the vulnerability, and information on the vulnerable hypercall handler and the error causing the vulnerability. We also show how the vulnerability can be triggered and discuss the state of the targeted hypervisor after the vulnerability has been triggered.

SEApr 19, 2012
EDCC 2012 - Fast Abstracts & Student Forum Proceedings

Marco Vieira, Ilir Gashi

Fast Abstracts at EDCC 2012 are short presentations, aiming to serve as a rapid and flexible mechanism to report on current work that may or may not be complete, introduce new ideas to the community, and state positions on controversial issues or open problems. This way, fast abstracts provide an opportunity to introduce new work, or present radical opinions, and receive early feedback from the community. Contributions are welcome from both academia and industry. The goal of the Student Forum is to encourage students to attend EDCC 2012 and present their work, exchange ideas with researchers and practitioners, and get early feedback on their research efforts. All papers were peer-reviewed by at least three program committee members, and the authors were provided with detailed comments on their work. In the end we had one accepted paper for the Student forum.