Javier Ron

h-index51

4papers

47citations

Novelty40%

AI Score36

Ranked #117,946 of 205,806 authors (top 57%)#1,535 in SE (top 45%)

4 Papers

52.4CRApr 30

zkSBOM: Privacy-Preserving SBOM Sharing with Zero-Knowledge Sets

Tom Sorger, Eric Cornelissen, Aman Sharma et al.

Software Bills of Materials (SBOMs) are increasingly mandated by regulators, yet existing sharing mechanisms impose a binary choice between full disclosure and full opacity. This exposes software suppliers to attacks that can be deduced from the SBOM only, such as the presence of a vulnerable dependency. Conversely, software consumers can be fooled by software suppliers who modify or misrepresent published SBOMs. We present zkSBOM, a privacy-preserving SBOM sharing mechanism designed to address these threats. zkSBOM uses zero-knowledge sets to cryptographically commit to the components within an SBOM. Software consumers can query for known vulnerabilities and receive a cryptographic proof confirming whether the artifact described by the SBOM is affected, without revealing any additional SBOM content. We conduct a security analysis of zkSBOM by quantifying expected leakage from inclusion and exclusion proofs. We demonstrate real-world feasibility by applying it to realistic scenarios and evaluating its operation requirements. Our evaluation demonstrates that zkSBOM is a strong, secure, and privacy-preserving mechanism for SBOM sharing, protecting software suppliers and software consumers from one another.

SEJan 31, 2024

Generative AI to Generate Test Data Generators

Benoit Baudry, Khashayar Etemadi, Sen Fang et al.

Generating fake data is an essential dimension of modern software testing, as demonstrated by the number and significance of data faking libraries. Yet, developers of faking libraries cannot keep up with the wide range of data to be generated for different natural languages and domains. In this paper, we assess the ability of generative AI for generating test data in different domains. We design three types of prompts for Large Language Models (LLMs), which perform test data generation tasks at different levels of integrability: 1) raw test data generation, 2) synthesizing programs in a specific language that generate useful test data, and 3) producing programs that use state-of-the-art faker libraries. We evaluate our approach by prompting LLMs to generate test data for 11 domains. The results show that LLMs can successfully generate realistic test data generators in a wide range of domains at all three levels of integrability.

SEOct 30, 2021

Chaos Engineering of Ethereum Blockchain Clients

Long Zhang, Javier Ron, Benoit Baudry et al.

In this paper, we present ChaosETH, a chaos engineering approach for resilience assessment of Ethereum blockchain clients. ChaosETH operates in the following manner: First, it monitors Ethereum clients to determine their normal behavior. Then, it injects system call invocation errors into one single Ethereum client at a time, and observes the behavior resulting from perturbation. Finally, ChaosETH compares the behavior recorded before, during, and after perturbation to assess the impact of the injected system call invocation errors. The experiments are performed on the two most popular Ethereum client implementations: GoEthereum and Nethermind. We assess the impact of 22 different system call errors on those Ethereum clients with respect to 15 application-level metrics. Our results reveal a broad spectrum of resilience characteristics of Ethereum clients w.r.t. system call invocation errors, ranging from direct crashes to full resilience. The experiments clearly demonstrate the feasibility of applying chaos engineering principles to blockchain systems.

SEDec 12, 2020

A Software-Repair Robot based on Continual Learning

Benoit Baudry, Zimin Chen, Khashayar Etemadi et al.

Software bugs are common and correcting them accounts for a significant part of costs in the software development and maintenance process. This calls for automatic techniques to deal with them. One promising direction towards this goal is gaining repair knowledge from historical bug fixing examples. Retrieving insights from software development history is particularly appealing with the constant progress of machine learning paradigms and skyrocketing `big' bug fixing data generated through Continuous Integration (CI). In this paper, we present R-Hero, a novel software repair bot that applies continual learning to acquire bug fixing strategies from continuous streams of source code changes, implemented for the single development platform Github/Travis CI. We describe R-Hero, our novel system for learning how to fix bugs based on continual training, and we uncover initial successes as well as novel research challenges for the community.