Cindy Rubio-González

SE
3papers
136citations
Novelty35%
AI Score24

3 Papers

SEMar 15, 2019Code
BugSwarm: Mining and Continuously Growing a Dataset of Reproducible Failures and Fixes

David A. Tomassi, Naji Dmeiri, Yichen Wang et al.

Fault-detection, localization, and repair methods are vital to software quality; but it is difficult to evaluate their generality, applicability, and current effectiveness. Large, diverse, realistic datasets of durably-reproducible faults and fixes are vital to good experimental evaluation of approaches to software quality, but they are difficult and expensive to assemble and keep current. Modern continuous-integration (CI) approaches, like Travis-CI, which are widely used, fully configurable, and executed within custom-built containers, promise a path toward much larger defect datasets. If we can identify and archive failing and subsequent passing runs, the containers will provide a substantial assurance of durable future reproducibility of build and test. Several obstacles, however, must be overcome to make this a practical reality. We describe BugSwarm, a toolset that navigates these obstacles to enable the creation of a scalable, diverse, realistic, continuously growing set of durably reproducible failing and passing versions of real-world, open-source systems. The BugSwarm toolkit has already gathered 3,091 fail-pass pairs, in Java and Python, all packaged within fully reproducible containers. Furthermore, the toolkit can be run periodically to detect fail-pass activities, thus growing the dataset continually.

SEOct 29, 2019
A Note About: Critical Review of BugSwarm for Fault Localization and Program Repair

David A. Tomassi, Cindy Rubio-González

Datasets play an important role in the advancement of software tools and facilitate their evaluation. BugSwarm is an infrastructure to automatically create a large dataset of real-world reproducible failures and fixes. In this paper, we respond to Durieux and Abreu's critical review of the BugSwarm dataset, referred to in this paper as CriticalReview. We replicate CriticalReview's study and find several incorrect claims and assumptions about the BugSwarm dataset. We discuss these incorrect claims and other contributions listed by CriticalReview. Finally, we discuss general misconceptions about BugSwarm, and our vision for the use of the infrastructure and dataset.

SEFeb 21, 2018
Path-Based Function Embedding and its Application to Specification Mining

Daniel DeFreez, Aditya V. Thakur, Cindy Rubio-González

Identifying the relationships among program elements is useful for program understanding, debugging, and analysis. One such relationship is synonymy. Function synonyms are functions that play a similar role in code, e.g. functions that perform initialization for different device drivers, or functions that implement different symmetric-key encryption schemes. Function synonyms are not necessarily semantically equivalent and can be syntactically dissimilar; consequently, approaches for identifying code clones or functional equivalence cannot be used to identify them. This paper presents func2vec, an algorithm that maps each function to a vector in a vector space such that function synonyms are grouped together. We compute the function embedding by training a neural network on sentences generated from random walks over an encoding of the program as a labeled pushdown system (l-PDS). We demonstrate that func2vec is effective at identifying function synonyms in the Linux kernel. Furthermore, we show how function synonyms enable mining error-handling specifications with high support in Linux file systems and drivers.