Marcelo de Almeida Maia

h-index17

10papers

122citations

Novelty31%

AI Score24

Ranked #172,269 of 194,257 authors (top 89%)#2,114 in SE (top 70%)

10 Papers

3.6SEJul 20, 2021Code

On the Interplay of Smells Large Class, Complex Class and Duplicate Code

Elder Vicente de Paulo Sobrinho, Marcelo de Almeida Maia

Bad smells have been defined to describe potential problems in code, possibly pointing out refactoring opportunities. Several empirical studies have highlighted that smells have a negative impact on comprehension and maintainability. Consequently, several approaches have been proposed to detect and restructure them. However, studies on the inter-relationship of occurrence of different types of smells in source code are still lacking, especially those focused on the quantification of this inter-relationship. In this work, we aim at understand and quantify the possible the inter-relation of smells Large Class - LC, Complex Class - CC and Duplicate Code - DC. In particular, we investigate patterns of LC and CC regarding the presence or absence of duplicate code. We conduct a quantitative study on five open source projects, and also a qualitative analysis to measure and understand the association of specific smells. As one of the main results, we highlight that there are "occurrence patterns" among these smells, for example: either in Complex Class or in the co-occurrence of Large Class and Complex Class, clones tend to be more prevalent in highly complex classes than less complex classes. The found patterns could be used to improve the performance of detection tools or even help in refactoring tasks.

3.6SEOct 14, 2021

Readability and Understandability of Snippets Recommended by General-purpose Web Search Engines: a Comparative Study

Carlos Eduardo C. Dantas, Marcelo A. Maia

Developers often search for reusable code snippets on general-purpose web search engines like Google, Yahoo! or Microsoft Bing. But some of these code snippets may have poor quality in terms of readability or understandability. In this paper, we propose an empirical analysis to analyze the readability and understandability score from snippets extracted from the web using three independent variables: ranking, general-purpose web search engine, and recommended site. We collected the top-5 recommended sites and their respective code snippet recommendations using Google, Yahoo!, and Bing for 9,480 queries, and evaluate their readability and understandability scores. We found that some recommended sites have significantly better readability and understandability scores than others. The better-ranked code snippet is not necessarily more readable or understandable than a lower-ranked code snippet for all general-purpose web search engines. Moreover, considering the readability score, Google has better-ranked code snippets compared to Yahoo! or Microsoft Bing

6.4SEAug 20, 2021

Readability and Understandability Scores for Snippet Assessment: an Exploratory Study

Carlos Eduardo C. Dantas, Marcelo A. Maia

Code search engines usually use readability feature to rank code snippets. There are several metrics to calculate this feature, but developers may have different perceptions about readability. Correlation between readability and understandability features has already been proposed, i.e., developers need to read and comprehend the code snippet syntax, but also understand the semantics. This work investigate scores for understandability and readability features, under the perspective of the possible subjective perception of code snippet comprehension. We find that code snippets with higher readability score has better comprehension than lower ones. The understandability score presents better comprehension in specific situations, e.g. nested loops or if-else chains. The developers also mentioned writability aspects as the principal characteristic to evaluate code snippets comprehension. These results provide insights for future works in code comprehension score optimization.

6.4SEAug 5, 2021

Improved Retrieval of Programming Solutions With Code Examples Using a Multi-featured Score

Rodrigo F. Silva, M. Masudur Rahman, Carlos Eduardo Dantas et al.

Developers often depend on code search engines to obtain solutions for their programming tasks. However, finding an expected solution containing code examples along with their explanations is challenging due to several issues. There is a vocabulary mismatch between the search keywords (the query) and the appropriate solutions. Semantic gap may increase for similar bag of words due to antonyms and negation. Moreover, documents retrieved by search engines might not contain solutions containing both code examples and their explanations. So, we propose CRAR (Crowd Answer Recommender) to circumvent those issues aiming at improving retrieval of relevant answers from Stack Overflow containing not only the expected code examples for the given task but also their explanations. Given a programming task, we investigate the effectiveness of combining information retrieval techniques along with a set of features to enhance the ranking of important threads (i.e., the units containing questions along with their answers) for the given task and then selects relevant answers contained in those threads, including semantic features, like word embeddings and sentence embeddings, for instance, a Convolutional Neural Network (CNN). CRAR also leverages social aspects of Stack Overflow discussions like popularity to select relevant answers for the tasks. Our experimental evaluation shows that the combination of the different features performs better than each one individually. We also compare the retrieval performance with the state-of-art CROKAGE (Crowd Knowledge Answer Generator), which is also a system aimed at retrieving relevant answers from Stack Overflow. We show that CRAR outperforms CROKAGE in Mean Reciprocal Rank and Mean Recall with small and medium effect sizes, respectively.

8.6SEMar 17, 2021Code

Towards a question answering assistant for software development using a transformer-based language model

Liliane do Nascimento Vale, Marcelo de Almeida Maia

Question answering platforms, such as Stack Overflow, have impacted substantially how developers search for solutions for their programming problems. The crowd knowledge content available from such platforms has also been used to leverage software development tools. The recent advances on Natural Language Processing, specifically on more powerful language models, have demonstrated ability to enhance text understanding and generation. In this context, we aim at investigating the factors that can influence on the application of such models for understanding source code related data and produce more interactive and intelligent assistants for software development. In this preliminary study, we particularly investigate if a how-to question filter and the level of context in the question may impact the results of a question answering transformer-based model. We suggest that fine-tuning models with corpus based on how-to questions can impact positively in the model and more contextualized questions also induce more objective answers.

9.9SEMar 21, 2019

Bootstrapping Cookbooks for APIs from Crowd Knowledge on Stack Overflow

Lucas B. L. Souza, Eduardo C. Campos, Fernanda Madeiral et al.

Well established libraries typically have API documentation. However, they frequently lack examples and explanations, possibly making difficult their effective reuse. Stack Overflow is a question-and-answer website oriented to issues related to software development. Despite the increasing adoption of Stack Overflow, the information related to a particular topic (e.g., an API) is spread across the website. Thus, Stack Overflow still lacks organization of the crowd knowledge available on it. Our target goal is to address the problem of the poor quality documentation for APIs by providing an alternative artifact to document them based on the crowd knowledge available on Stack Overflow, called crowd cookbook. A cookbook is a recipe-oriented book, and we refer to our cookbook as crowd cookbook since it contains content generated by a crowd. The cookbooks are meant to be used through an exploration process, i.e. browsing. In this paper, we present a semi-automatic approach that organizes the crowd knowledge available on Stack Overflow to build cookbooks for APIs. We have generated cookbooks for three APIs widely used by the software development community: SWT, LINQ and QT. We have also defined desired properties that crowd cookbooks must meet, and we conducted an evaluation of the cookbooks against these properties with human subjects. The results showed that the cookbooks built using our approach, in general, meet those properties. As a highlight, most of the recipes were considered appropriate to be in the cookbooks and have self-contained information. We concluded that our approach is capable to produce adequate cookbooks automatically, which can be as useful as manually produced cookbooks. This opens an opportunity for API designers to enrich existent cookbooks with the different points of view from the crowd, or even to generate initial versions of new cookbooks.

13.2SEMar 18, 2019Code

Recommending Comprehensive Solutions for Programming Tasks by Mining Crowd Knowledge

Rodrigo F. G. Silva, Chanchal K. Roy, Mohammad Masudur Rahman et al.

Developers often search for relevant code examples on the web for their programming tasks. Unfortunately, they face two major problems. First, the search is impaired due to a lexical gap between their query (task description) and the information associated with the solution. Second, the retrieved solution may not be comprehensive, i.e., the code segment might miss a succinct explanation. These problems make the developers browse dozens of documents in order to synthesize an appropriate solution. To address these two problems, we propose CROKAGE (Crowd Knowledge Answer Generator), a tool that takes the description of a programming task (the query) and provides a comprehensive solution for the task. Our solutions contain not only relevant code examples but also their succinct explanations. Our proposed approach expands the task description with relevant API classes from Stack Overflow Q&A threads and then mitigates the lexical gap problems. Furthermore, we perform natural language processing on the top quality answers and then return such programming solutions containing code examples and code explanations unlike earlier studies. We evaluate our approach using 48 programming queries and show that it outperforms six baselines including the state-of-art by a statistically significant margin. Furthermore, our evaluation with 29 developers using 24 tasks (queries) confirms the superiority of CROKAGE over the state-of-art tool in terms of relevance of the suggested code examples, benefit of the code explanations and the overall solution quality (code + explanation).

14.7SEJul 30, 2018Code

Towards an automated approach for bug fix pattern detection

Fernanda Madeiral, Thomas Durieux, Victor Sobreira et al.

The characterization of bug datasets is essential to support the evaluation of automatic program repair tools. In a previous work, we manually studied almost 400 human-written patches (bug fixes) from the Defects4J dataset and annotated them with properties, such as repair patterns. However, manually finding these patterns in different datasets is tedious and time-consuming. To address this activity, we designed and implemented PPD, a detector of repair patterns in patches, which performs source code change analysis at abstract-syntax tree level. In this paper, we report on PPD and its evaluation on Defects4J, where we compare the results from the automated detection with the results from the previous manual analysis. We found that PPD has overall precision of 91% and overall recall of 92%, and we conclude that PPD has the potential to detect as many repair patterns as human manual analysis.

11.4SEMar 28, 2017

On the Interplay between Non-Functional Requirements and Builds on Continuous Integration

Klérisson V. R. Paixão, Crícia Z. Felício, Fernanda M. Delfim et al.

Continuous Integration (CI) implies that a whole developer team works together on the mainline of a software project. CI systems automate the builds of a software. Sometimes a developer checks in code, which breaks the build. A broken build might not be a problem by itself, but it has the potential to disrupt co-workers, hence it affects the performance of the team. In this study, we investigate the interplay between nonfunctional requirements (NFRs) and builds statuses from 1,283 software projects. We found significant differences among NFRs related-builds statuses. Thus, tools can be proposed to improve CI with focus on new ways to prevent failures into CI, specially for efficiency and usability related builds. Also, the time required to put a broken build back on track indicates a bimodal distribution along all NFRs, with higher peaks within a day and lower peaks in six weeks. Our results suggest that more planned schedule for maintainability for Ruby, and for functionality and reliability for Java would decrease delays related to broken builds.

6.6SEJun 18, 2015

ModularityCheck: A Tool for Assessing Modularity using Co-Change Clusters

Luciana Silva, Daniel Felix, Marco Tulio Valente et al.

It is widely accepted that traditional modular structures suffer from the dominant decomposition problem. Therefore, to improve current modularity views, it is important to investigate the impact of design decisions concerning modularity in other dimensions, as the evolutionary view. In this paper, we propose the ModularityCheck tool to assess package modularity using co-change clusters, which are sets of classes that usually changed together in the past. Our tool extracts information from version control platforms and issue reports, retrieves co-change clusters, generates metrics related to co-change clusters, and provides visualizations for assessing modularity. We also provide a case study to evaluate the tool. http://youtu.be/7eBYa2dfIS8