49.5SEMay 29
Beyond Strict Rules: Assessing the Effectiveness of Large Language Models for Code Smell DetectionSaymon Souza, Amanda Santana, Eduardo Figueiredo et al.
Code smells are symptoms of potential code quality problems that may affect software maintainability, thus increasing development costs and impacting software reliability. Large language models (LLMs) have shown remarkable capabilities for supporting various software engineering activities, but their use for detecting code smells remains underexplored. However, unlike the rigid rules of static analysis tools, LLMs can support flexible and adaptable detection strategies tailored to the unique properties of code smells. This paper evaluates the effectiveness of four LLMs -- DeepSeek-R1, GPT-5 mini, Llama-3.3, and Qwen2.5-Code -- for detecting nine code smells across 30 Java projects. For the empirical evaluation, we created a ground-truth dataset by asking 76 developers to manually inspect 268 code-smell candidates. Our results indicate that LLMs perform strongly for structurally straightforward smells, such as Large Class and Long Method. However, we also observed that different LLMs and tools fare better for distinct code smells. We then propose and evaluate a detection strategy that combines LLMs and static analysis tools. The proposed strategy outperforms LLMs and tools in five out of nine code smells in terms of F1-Score. However, it also generates more false positives for complex smells. Therefore, we conclude that the optimal strategy depends on whether Recall or Precision is the main priority for code smell detection.
42.7SEMay 15Code
What's Inside a GitHub Repository? An Empirical Study on the Contents of 10K ProjectsAndre Hora, João Eduardo Montandon, Diego Elias Costa
GitHub is the largest code hosting platform, with millions of repositories spanning multiple technologies. Despite this, little is known about the actual contents of GitHub's repositories in the wild. This paper presents an initial empirical analysis to better understand the contents of real-world GitHub repositories. We analyze the files, directories, and extensions present in 10,000 GitHub repositories, as well as their evolution over ten years. Our results show major changes in GitHub over the last decade: (1) the consolidation of README.md, .gitignore, and LICENSE as standard artifacts; (2) the rise of GitHub Actions as the dominant CI/CD platform; (3) the growth of configuration formats such as TOML, YAML, and JSON, alongside a decline in XML; (4) new trends, such as the growth of Dockerfile; and (5) emerging content related to LLMs and generative AI (e.g., AGENTS.md). Based on our findings, we discuss implications, including that open source is not only evolving organically but also increasingly guided by GitHub's standards, the rise and fall of technologies, and the potential support for mining software repository studies.
SEApr 12, 2020Code
Are Game Engines Software Frameworks? A Three-perspective StudyCristiano Politowski, Fabio Petrillo, João Eduardo Montandon et al.
Game engines help developers create video games and avoid duplication of code and effort, like frameworks for traditional software systems. In this paper, we explore open-source game engines along three perspectives: literature, code, and human. First, we explore and summarise the academic literature on game engines. Second, we compare the characteristics of the 282 most popular engines and the 282 most popular frameworks in GitHub. Finally, we survey 124 engine developers about their experience with the development of their engines. We report that: (1) Game engines are not well-studied in software-engineering research with few studies having engines as object of research. (2) Open-source game engines are slightly larger in terms of size and complexity and less popular and engaging than traditional frameworks. Their programming languages differ greatly from frameworks. Engine projects have shorter histories with less releases. (3) Developers perceive game engines as different from traditional frameworks. Generally, they build game engines to (a) better control the environment and source code, (b) learn about game engines, and (c) develop specific games. We conclude that open-source game engines have differences compared to traditional open-source frameworks although this differences do not demand special treatments.
SEFeb 13, 2022
Video Game Project Management Anti-patternsGabriel C. Ullmann, Cristiano Politowski, Yann-Gaël Guéhéneuc et al.
Project Management anti-patterns are well-documented in the software-engineering literature, and studying them allows understanding their impacts on teams and projects. The video game development industry is known for its mismanagement practices, and therefore applying this knowledge would help improving game developers' productivity and well-being. In this paper, we map project management anti-patterns to anti-patterns reported by game developers in the gray literature. We read 440 postmortems problems, identified anti-pattern candidates, and related them with definitions from the software-engineering literature. We discovered that most anti-pattern candidates could be mapped to anti-patterns in the software-engineering literature, except for Feature Creep, Feature Cuts, Working on Multiple Projects, and Absent or Inadequate Tools. We discussed the impact of the unmapped candidates on the development process while also drawing a parallel between video games and traditional software development. Future works include validating the definitions of the candidates via survey with practitioners and also considering development anti-patterns.
SENov 4, 2020
What Skills do IT Companies look for in New Developers? A Study with Stack Overflow JobsJoão Eduardo Montandon, Cristiano Politowski, Luciana Lourdes Silva et al.
Context: There is a growing demand for information on how IT companies look for candidates to their open positions. Objective: This paper investigates which hard and soft skills are more required in IT companies by analyzing the description of 20,000 job opportunities. Method: We applied open card sorting to perform a high-level analysis on which types of hard skills are more requested. Further, we manually analyzed the most mentioned soft skills. Results: Programming languages are the most demanded hard skills. Communication, collaboration, and problem-solving are the most demanded soft skills. Conclusion: We recommend developers to organize their resumé according to the positions they are applying. We also highlight the importance of soft skills, as they appear in many job opportunities.