Fernando Brito e Abreu

SE
h-index2
9papers
17citations
Novelty25%
AI Score23

9 Papers

SEDec 16, 2020Code
Code smells detection and visualization: A systematic literature review

José Pereira dos Reis, Fernando Brito e Abreu, Glauco de Figueiredo Carneiro et al.

Context: Code smells (CS) tend to compromise software quality and also demand more effort by developers to maintain and evolve the application throughout its life-cycle. They have long been catalogued with corresponding mitigating solutions called refactoring operations. Objective: This SLR has a twofold goal: the first is to identify the main code smells detection techniques and tools discussed in the literature, and the second is to analyze to which extent visual techniques have been applied to support the former. Method: Over 83 primary studies indexed in major scientific repositories were identified by our search string in this SLR. Then, following existing best practices for secondary studies, we applied inclusion/exclusion criteria to select the most relevant works, extract their features and classify them. Results: We found that the most commonly used approaches to code smells detection are search-based (30.1%), and metric-based (24.1%). Most of the studies (83.1%) use open-source software, with the Java language occupying the first position (77.1%). In terms of code smells, God Class (51.8%), Feature Envy (33.7%), and Long Method (26.5%) are the most covered ones. Machine learning techniques are used in 35% of the studies. Around 80% of the studies only detect code smells, without providing visualization techniques. In visualization-based approaches several methods are used, such as: city metaphors, 3D visualization techniques. Conclusions: We confirm that the detection of CS is a non trivial task, and there is still a lot of work to be done in terms of: reducing the subjectivity associated with the definition and detection of CS; increasing the diversity of detected CS and of supported programming languages; constructing and sharing oracles and datasets to facilitate the replication of CS detection and visualization techniques validation experiments.

DCFeb 12, 2012Code
The cloud paradigm: Are you tuned for the lyrics?

Fernando Brito e Abreu

Major players, business angels and opinion-makers are broadcasting beguiled lyrics on the most recent IT hype: your software should ascend to the clouds. There are many clouds and the stake is high. Distractedly, many of us became assiduous users of the cloud, but perhaps due to the legacy systems and legacy knowledge, IT professionals, mainly those many that work in business information systems for the long tail, are not as much plunged into producing cloud-based systems for their clients. This keynote will delve into several aspects of this cloud paradigm, from more generic concerns regarding security and value for money, to more specific worries that reach software engineers in general. Do we need a different software development process? Are development techniques and tools mature enough? What about the role of open-source in the cloud? How do we assess the quality in cloud-based development? Please stay tuned for more!

IROct 24, 2024
Smart ETL and LLM-based contents classification: the European Smart Tourism Tools Observatory experience

Diogo Cosme, António Galvão, Fernando Brito e Abreu

Purpose: Our research project focuses on improving the content update of the online European Smart Tourism Tools (STTs) Observatory by incorporating and categorizing STTs. The categorization is based on their taxonomy, and it facilitates the end user's search process. The use of a Smart ETL (Extract, Transform, and Load) process, where \emph{Smart} indicates the use of Artificial Intelligence (AI), is central to this endeavor. Methods: The contents describing STTs are derived from PDF catalogs, where PDF-scraping techniques extract QR codes, images, links, and text information. Duplicate STTs between the catalogs are removed, and the remaining ones are classified based on their text information using Large Language Models (LLMs). Finally, the data is transformed to comply with the Dublin Core metadata structure (the observatory's metadata structure), chosen for its wide acceptance and flexibility. Results: The Smart ETL process to import STTs to the observatory combines PDF-scraping techniques with LLMs for text content-based classification. Our preliminary results have demonstrated the potential of LLMs for text content-based classification. Conclusion: The proposed approach's feasibility is a step towards efficient content-based classification, not only in Smart Tourism but also adaptable to other fields. Future work will mainly focus on refining this classification process.

SEJan 17, 2021
Profiling Software Developers with Process Mining and N-Gram Language Models

João Caldeira, Fernando Brito e Abreu, Jorge Cardoso et al.

Context: Profiling developers is challenging since many factors, such as their skills, experience, development environment and behaviors, may influence a detailed analysis and the delivery of coherent interpretations. Objective: We aim at profiling software developers by mining their software development process. To do so, we performed a controlled experiment where, in the realm of a Python programming contest, a group of developers had the same well-defined set of requirements specifications and a well-defined sprint schedule. Events were collected from the PyCharm IDE, and from the Mooshak automatic jury where subjects checked-in their code. Method: We used n-gram language models and text mining to characterize developers' profiles, and process mining algorithms to discover their overall workflows and extract the correspondent metrics for further evaluation. Results: Findings show that we can clearly characterize with a coherent rationale most developers, and distinguish the top performers from the ones with more challenging behaviors. This approach may lead ultimately to the creation of a catalog of software development process smells. Conclusions: The profile of a developer provides a software project manager a clue for the selection of appropriate tasks he/she should be assigned. With the increasing usage of low and no-code platforms, where coding is automatically generated from an upper abstraction layer, mining developer's actions in the development platforms is a promising approach to early detect not only behaviors but also assess project complexity and model effort.

SEDec 31, 2020
PHP code smells in web apps: survival and anomalies

Américo Rio, Fernando Brito e Abreu

Context: Code smells are considered symptoms of poor design, leading to future problems, such as reduced maintainability. Except for anecdotal cases (e. g. code dropout), a code smell survives until it gets explicitly refactored or removed. This paper presents a longitudinal study on the survival of code smells for web apps built with PHP. Objectives: RQ: (i) code smells survival depends on their scope? (ii) practitioners attitudes towards code smells removal in web apps have changed throughout time? (iii) how long code smells survive in web applications? (iv) are there sudden variations (anomalies) in the density of code smells through the evolution of web apps? Method: We analyze the evolution of 6 code smells in 8 web applications written in PHP at the server side, across several years, using the survival analysis technique. We classify code smells according to scope in two categories: scattered and localized. Scattered code smells are expected to be more harmful since their influence is not circumscribed as in localized code smells. We split the observations for each web app into two equal and consecutive timeframes, to test the hypothesis that code smells awareness has increased throughout time. As for the anomalies, we standardize their detection criteria. Results: We present some evidence that code smells survival depends on their scope: the average survival rate decreases in some of them, while the opposite is observed for the remainder. The survival of localized code smells is around 4 years, while the scattered ones live around 5 years. Around 60% of the smells are removed, and some live through all the application life. We also show how a graphical representation of anomalies found in the evolution of code smells allows unveiling the story of a development project and make managers aware of the need for enforcing regular refactoring practices.

SEDec 23, 2020
Crowdsmelling: The use of collective knowledge in code smells detection

José Pereira dos Reis, Fernando Brito e Abreu, Glauco de Figueiredo Carneiro

Code smells are seen as major source of technical debt and, as such, should be detected and removed. However, researchers argue that the subjectiveness of the code smells detection process is a major hindrance to mitigate the problem of smells-infected code. We proposed the crowdsmelling approach based on supervised machine learning techniques, where the wisdom of the crowd (of software developers) is used to collectively calibrate code smells detection algorithms, thereby lessening the subjectivity issue. This paper presents the results of a validation experiment for the crowdsmelling approach. In the context of three consecutive years of a Software Engineering course, a total "crowd" of around a hundred teams, with an average of three members each, classified the presence of 3 code smells (Long Method, God Class, and Feature Envy) in Java source code. These classifications were the basis of the oracles used for training six machine learning algorithms. Over one hundred models were generated and evaluated to determine which machine learning algorithms had the best performance in detecting each of the aforementioned code smells. Good performances were obtained for God Class detection (ROC=0.896 for Naive Bayes) and Long Method detection (ROC=0.870 for AdaBoostM1), but much lower for Feature Envy (ROC=0.570 for Random Forrest). Obtained results suggest that crowdsmelling is a feasible approach for the detection of code smells, but further validation experiments are required to cover more code smells and to increase external validity.

SEOct 29, 2020
Unveiling process insights from refactoring practices

João Caldeira, Fernando Brito e Abreu, Jorge Cardoso et al.

Context : Software comprehension and maintenance activities, such as refactoring, are said to be negatively impacted by software complexity. The methods used to measure software product and processes complexity have been thoroughly debated in the literature. However, the discernment about the possible links between these two dimensions, particularly on the benefits of using the process perspective, has a long journey ahead. Objective: To improve the understanding of the liaison of developers' activities and software complexity within a refactoring task, namely by evaluating if process metrics gathered from the IDE, using process mining methods and tools, are suitable to accurately classify different refactoring practices and the resulting software complexity. Method: We mined source code metrics from a software product after a quality improvement task was given in parallel to (117) software developers, organized in (71) teams. Simultaneously, we collected events from their IDE work sessions (320) and used process mining to model their processes and extract the correspondent metrics. Results: Most teams using a plugin for refactoring (JDeodorant) reduced software complexity more effectively and with simpler processes than the ones that performed refactoring using only Eclipse native features. We were able to find moderate correlations (43%) between software cyclomatic complexity and process cyclomatic complexity. The best models found for the refactoring method and cyclomatic complexity level predictions, had an accuracy of 92.95% and 94.36%, respectively. Conclusions: Our approach agnostic to programming languages, geographic location, or development practices. Initial findings are encouraging, and lead us to suggest practitioners may use our method in other development tasks, such as, defect analysis and unit or integration tests.

SEJul 20, 2020
Software Development Analytics in Practice: A Systematic Literature Review

Joao Caldeira, Fernando Brito e Abreu, Jorge Cardoso et al.

Context:Software Development Analytics is a research area concerned with providing insights to improve product deliveries and processes. Many types of studies, data sources and mining methods have been used for that purpose. Objective:This systematic literature review aims at providing an aggregate view of the relevant studies on Software Development Analytics in the past decade, with an emphasis on its application in practical settings. Method:Definition and execution of a search string upon several digital libraries, followed by a quality assessment criteria to identify the most relevant papers. On those, we extracted a set of characteristics (study type, data source, study perspective, development life-cycle activities covered, stakeholders, mining methods, and analytics scope) and classified their impact against a taxonomy. Results:Source code repositories, experimental case studies, and developers are the most common data sources, study types, and stakeholders, respectively. Product and project managers are also often present, but less than expected. Mining methods are evolving rapidly and that is reflected in the long list identified. Descriptive statistics are the most usual method followed by correlation analysis. Being software development an important process in every organization, it was unexpected to find that process mining was present in only one study. Most contributions to the software development life cycle were given in the quality dimension. Time management and costs control were lightly debated. The analysis of security aspects suggests it is an increasing topic of concern for practitioners. Risk management contributions are scarce. Conclusions:There is a wide improvement margin for software development analytics in practice. For instance, mining and analyzing the activities performed by software developers in their actual workbench, the IDE.

SEApr 29, 2012
An Eclipse Plugin to Support Code Smells Detection

Tiago Pessoa, Fernando Brito e Abreu, Miguel Pessoa Monteiro et al.

Eradication of code smells is often pointed out as a way to improve readability, extensibility and design in existing software. However, code smell detection in large systems remains time consuming and error-prone, partly due to the inherent subjectivity of the detection processes presently available. In view of mitigating the subjectivity problem, this paper presents a tool that automates a technique for the detection and assessment of code smells in Java source code, developed as an Eclipse plug-in. The technique is based upon a Binary Logistic Regression model and calibrated by expert's knowledge. A short overview of the technique is provided and the tool is described.