SEFeb 23, 2021Code
Automating Test Case Identification in Java Open Source Projects on GitHubMatej Madeja, Jaroslav Porubän, Michaela Bačíková et al.
Software testing is one of the very important Quality Assurance (QA) components. A lot of researchers deal with the testing process in terms of tester motivation and how tests should or should not be written. However, it is not known from the recommendations how the tests are written in real projects. In this paper, the following was investigated: (i) the denotation of the word "test" in different natural languages; (ii) whether the number of occurrences of the word "test" correlates with the number of test cases; and (iii) what testing frameworks are mostly used. The analysis was performed on 38 GitHub open source repositories thoroughly selected from the set of 4.3M GitHub projects. We analyzed 20,340 test cases in 803 classes manually and 170k classes using an automated approach. The results show that: (i) there exists a weak correlation (r = 0.655) between the number of occurrences of the word "test" and the number of test cases in a class; (ii) the proposed algorithm using static file analysis correctly detected 97% of test cases; (iii) 15% of the analyzed classes used main() function whose represent regular Java programs that test the production code without using any third-party framework. The identification of such tests is very complex due to implementation diversity. The results may be leveraged to more quickly identify and locate test cases in a repository, to understand practices in customized testing solutions, and to mine tests to improve program comprehension in the future.
SEJul 26, 2018Code
Trend Analysis on the Metadata of Program Comprehension PapersMatúš Sulír, Jaroslav Porubän
As program comprehension is a vast research area, it is necessary to get an overview of its rising and falling trends. We performed an n-gram frequency analysis on titles, abstracts and keywords of 1885 articles about program comprehension from the years 2000-2014. According to this analysis, the most rising trends are feature location and open source systems, the most falling ones are program slicing and legacy systems.
SEDec 4, 2017Code
A Quantitative Study of Java Software BuildabilityMatúš Sulír, Jaroslav Porubän
Researchers, students and practitioners often encounter a situation when the build process of a third-party software system fails. In this paper, we aim to confirm this observation present mainly as anecdotal evidence so far. Using a virtual environment simulating a programmer's one, we try to fully automatically build target archives from the source code of over 7,200 open source Java projects. We found that more than 38% of builds ended in failure. Build log analysis reveals the largest portion of errors are dependency-related. We also conduct an association study of factors affecting build success.
SEDec 1, 2020
Customizing Host IDE for Non-programming Users of Pure Embedded DSLs: A Case StudyMilan Nosáľ, Jaroslav Porubän, Matúš Sulír
Pure embedding as an implementation strategy of domain-specific languages (DSLs) benefits from low implementation costs. On the other hand, it introduces undesired syntactic noise that impedes involvement of non-programming domain experts. Due to this, pure embedded DSLs are generally not intended for, nor used by, non-programmers. In this work, we try to challenge this state by experimenting with inexpensive customizations of the host IDE (Integrated Development Environment) to reduce the negative impact of syntactic noise. We present several techniques and recommendations based on standard IDE features (e.g., file templates, code folding, etc.) that aim to reduce syntactic noise and generally improve the user experience with pure embedded DSLs. The techniques are presented using a NetBeans IDE case study. The goal of the proposed techniques is to improve the user experience with pure embedded DSLs with a focus on the involvement of non-programming domain experts (or non-programmers in general). The proposed techniques were evaluated using a controlled experiment. The experiment compared a group using Ruby and non-modified RubyMine IDE versus a group using Java and NetBeans IDE customized to use the proposed techniques. Experiment results indicate that even inexpensive host IDE customizations can significantly alleviate issues caused by the syntactic noise: Java with its inflexible syntax performed better than Ruby with its concise syntax.
SEDec 1, 2020
Designing Voice-Controllable APIsMatúš Sulír, Jaroslav Porubän
The main purpose of a voice command system is to process a sentence in natural language and perform the corresponding action. Although there exist many approaches to map sentences to API (application programming interface) calls, this mapping is usually performed after the API is already implemented, possibly by other programmers. In this paper, we describe how the API developer can use patterns to map sentences to API calls by utilizing the similarities between names and types in the sentences and the API. In the cases when the mapping is not straightforward, we suggest the usage of suitable annotations (attribute-oriented programming).
SENov 30, 2020
Toward a Benchmark Repository for Software Maintenance Tool Evaluations with HumansMatúš Sulír
To evaluate software maintenance techniques and tools in controlled experiments with human participants, researchers currently use projects and tasks selected on an ad-hoc basis. This can unrealistically favor their tool, and it makes the comparison of results difficult. We suggest a gradual creation of a benchmark repository with projects, tasks, and metadata relevant for human-based studies. In this paper, we discuss the requirements and challenges of such a repository, along with the steps which could lead to its construction.
SENov 11, 2019
Draw This Object: A Study of Debugging RepresentationsMatúš Sulír, Ján Juhár
Domain-specific debugging visualizations try to provide a view of a runtime object tailored to a specific domain and highlighting its important properties. The research in this area has focused mainly on the technical aspects of the creation of such views so far. However, we still lack answers to questions such as what properties of objects are considered important for these visualizations, whether all objects have an appropriate domain-specific view, or what clues could help us to construct these views fully automatically. In this paper, we describe an exploratory study where the participants were asked to inspect runtime states of objects displayed in a traditional debugger and draw ideal domain-specific views of these objects on paper. We describe interesting observations and findings obtained during this study and a preliminary taxonomy of these visualizations.
SEDec 18, 2018
Integrating Runtime Values with Source Code to Facilitate Program ComprehensionMatúš Sulír
An inherently abstract nature of source code makes programs difficult to understand. In our research, we designed three techniques utilizing concrete values of variables and other expressions during program execution. RuntimeSearch is a debugger extension searching for a given string in all expressions at runtime. DynamiDoc generates documentation sentences containing examples of arguments, return values and state changes. RuntimeSamp augments source code lines in the IDE (integrated development environment) with sample variable values. In this post-doctoral article, we briefly describe these three approaches and related motivational studies, surveys and evaluations. We also reflect on the PhD study, providing advice for current students. Finally, short-term and long-term future work is described.
SEAug 30, 2018
IDE-Independent Program Comprehension Tools via Source File OverwritingMatúš Sulír, Jaroslav Porubän, Ondrej Zoričák
Traditionally, we have two possibilities to design tools for program comprehension and analysis. The first option is to create a standalone program, independent of any source code editor. This way, the act of source code editing is separated from the act of viewing the code analysis results. The second option is to create a plugin for a specific IDE (integrated development environment) - in this case, a separate version must be created for each IDE. We propose an approach where information about source code elements is written directly into source files as annotations or special comments. Before committing to a version control system, the annotations are removed from the source code to avoid code pollution. We briefly evaluate the approach and delineate its limitations.
SEAug 10, 2018
Recording Concerns in Source Code Using AnnotationsMatúš Sulír, Milan Nosáľ, Jaroslav Porubän
A concern can be characterized as a developer's intent behind a piece of code, often not explicitly captured in it. We discuss a technique of recording concerns using source code annotations (concern annotations). Using two studies and two controlled experiments, we seek to answer the following 3 research questions: 1) Do programmers' mental models overlap? 2) How do developers use shared concern annotations when they are available? 3) Does using annotations created by others improve program comprehension and maintenance correctness, time and confidence? The first study shows that developers' mental models, recorded using concern annotations, overlap and thus can be shared. The second study shows that shared concern annotations can be used during program comprehension for the following purposes: hypotheses confirmation, feature location, obtaining new knowledge, finding relationships and maintenance notes. The first controlled experiment with students showed that the presence of annotations significantly reduced program comprehension and maintenance time by 34%. The second controlled experiment was a differentiated replication of the first one, focused on industrial developers. It showed a 33% significant improvement in correctness. We conclude that concern annotations are a viable way to share developers' thoughts.
SEJul 25, 2018
RuntimeSearch: Ctrl+F for a Running ProgramMatúš Sulír, Jaroslav Porubän
Developers often try to find occurrences of a certain term in a software system. Traditionally, a text search is limited to static source code files. In this paper, we introduce a simple approach, RuntimeSearch, where the given term is searched in the values of all string expressions in a running program. When a match is found, the program is paused and its runtime properties can be explored with a traditional debugger. The feasibility and usefulness of RuntimeSearch is demonstrated on a medium-sized Java project.
SEJun 19, 2018
Augmenting Source Code Lines with Sample Variable ValuesMatúš Sulír, Jaroslav Porubän
Source code is inherently abstract, which makes it difficult to understand. Activities such as debugging can reveal concrete runtime details, including the values of variables. However, they require that a developer explicitly requests these data for a specific execution moment. We present a simple approach, RuntimeSamp, which collects sample variable values during normal executions of a program by a programmer. These values are then displayed in an ambient way at the end of each line in the source code editor. We discuss questions which should be answered for this approach to be usable in practice, such as how to efficiently record the values and when to display them. We provide partial answers to these questions and suggest future research directions.
SEApr 5, 2018
Visual augmentation of source code editors: A systematic mapping studyMatúš Sulír, Michaela Bačíková, Sergej Chodarev et al.
Source code written in textual programming languages is typically edited in integrated development environments or specialized code editors. These tools often display various visual items, such as icons, color highlights or more advanced graphical overlays directly in the main editable source code view. We call such visualizations source code editor augmentation. In this paper, we present a first systematic mapping study of source code editor augmentation tools and approaches. We manually reviewed the metadata of 5,553 articles published during the last twenty years in two phases -- keyword search and references search. The result is a list of 103 relevant articles and a taxonomy of source code editor augmentation tools with seven dimensions, which we used to categorize the resulting list of the surveyed articles. We also provide the definition of the term source code editor augmentation, along with a brief overview of historical development and augmentations available in current industrial IDEs.