Rudolf Ferenc

SE
h-index9
8papers
80citations
Novelty35%
AI Score41

8 Papers

12.2SEApr 20
Towards Better Static Code Analysis Reports: Sentence Transformer-based Filtering of Non-Actionable Alerts

Tamás Aladics, Norbert Vándor, Rudolf Ferenc et al.

Static code analysis (SCA) tools are widely used as effective ways to detect bugs and vulnerabilities in software systems. However, the reports generated by these tools often contain a large number of non-actionable findings, which can overwhelm developers to the point of ignoring them altogether -- this phenomenon is known as "alert fatigue". In this paper, we combat alert fatigue by proposing STAF: Sentence Transformer-based Actionability Filtering. Our approach leverages a transformer based architecture with sentence embeddings to classify findings into actionable and non-actionable categories. Evaluating STAF on a large dataset of reports from Java projects, we demonstrate that our method can effectively reduce the number of non-actionable findings while maintaining a high level of accuracy in identifying actionable issues. The results show that our approach can improve the usability of static analysis tools reaching an F1 score of 89%, outperforming existing methods for SCA warning filtering by at least 11% in a within-project setting and by at least 6% in a cross-project setting. By providing a more focused and relevant set of findings, we aim to enhance the overall effectiveness of static analysis in software development.

SEApr 22, 2024
Assessing GPT-4-Vision's Capabilities in UML-Based Code Generation

Gábor Antal, Richárd Vozár, Rudolf Ferenc

The emergence of advanced neural networks has opened up new ways in automated code generation from conceptual models, promising to enhance software development processes. This paper presents a preliminary evaluation of GPT-4-Vision, a state-of-the-art deep learning model, and its capabilities in transforming Unified Modeling Language (UML) class diagrams into fully operating Java class files. In our study, we used exported images of 18 class diagrams comprising 10 single-class and 8 multi-class diagrams. We used 3 different prompts for each input, and we manually evaluated the results. We created a scoring system in which we scored the occurrence of elements found in the diagram within the source code. On average, the model was able to generate source code for 88% of the elements shown in the diagrams. Our results indicate that GPT-4-Vision exhibits proficiency in handling single-class UML diagrams, successfully transforming them into syntactically correct class files. However, for multi-class UML diagrams, the model's performance is weaker compared to single-class diagrams. In summary, further investigations are necessary to exploit the model's potential completely.

SEJun 13, 2025
Identifying Helpful Context for LLM-based Vulnerability Repair: A Preliminary Study

Gábor Antal, Bence Bogenfürst, Rudolf Ferenc et al.

Recent advancements in large language models (LLMs) have shown promise for automated vulnerability detection and repair in software systems. This paper investigates the performance of GPT-4o in repairing Java vulnerabilities from a widely used dataset (Vul4J), exploring how different contextual information affects automated vulnerability repair (AVR) capabilities. We compare the latest GPT-4o's performance against previous results with GPT-4 using identical prompts. We evaluated nine additional prompts crafted by us that contain various contextual information such as CWE or CVE information, and manually extracted code contexts. Each prompt was executed three times on 42 vulnerabilities, and the resulting fix candidates were validated using Vul4J's automated testing framework. Our results show that GPT-4o performed 11.9\% worse on average than GPT-4 with the same prompt, but was able to fix 10.5\% more distinct vulnerabilities in the three runs together. CVE information significantly improved repair rates, while the length of the task description had minimal impact. Combining CVE guidance with manually extracted code context resulted in the best performance. Using our \textsc{Top}-3 prompts together, GPT-4o repaired 26 (62\%) vulnerabilities at least once, outperforming both the original baseline (40\%) and its reproduction (45\%), suggesting that ensemble prompt strategies could improve vulnerability repair in zero-shot settings.

SEJun 13, 2025
Leveraging GPT-4 for Vulnerability-Witnessing Unit Test Generation

Gábor Antal, Dénes Bán, Martin Isztin et al.

In the life-cycle of software development, testing plays a crucial role in quality assurance. Proper testing not only increases code coverage and prevents regressions but it can also ensure that any potential vulnerabilities in the software are identified and effectively fixed. However, creating such tests is a complex, resource-consuming manual process. To help developers and security experts, this paper explores the automatic unit test generation capability of one of the most widely used large language models, GPT-4, from the perspective of vulnerabilities. We examine a subset of the VUL4J dataset containing real vulnerabilities and their corresponding fixes to determine whether GPT-4 can generate syntactically and/or semantically correct unit tests based on the code before and after the fixes as evidence of vulnerability mitigation. We focus on the impact of code contexts, the effectiveness of GPT-4's self-correction ability, and the subjective usability of the generated test cases. Our results indicate that GPT-4 can generate syntactically correct test cases 66.5\% of the time without domain-specific pre-training. Although the semantic correctness of the fixes could be automatically validated in only 7. 5\% of the cases, our subjective evaluation shows that GPT-4 generally produces test templates that can be further developed into fully functional vulnerability-witnessing tests with relatively minimal manual effort. Therefore, despite the limited data, our initial findings suggest that GPT-4 can be effectively used in the generation of vulnerability-witnessing tests. It may not operate entirely autonomously, but it certainly plays a significant role in a partially automated process.

SEOct 11, 2021
Bug Prediction Using Source Code Embedding Based on Doc2Vec

Tamás Aladics, Judit Jász, Rudolf Ferenc

Bug prediction is a resource demanding task that is hard to automate using static source code analysis. In many fields of computer science, machine learning has proven to be extremely useful in tasks like this, however, for it to work we need a way to use source code as input. We propose a simple, but meaningful representation for source code based on its abstract syntax tree and the Doc2Vec embedding algorithm. This representation maps the source code to a fixed length vector which can be used for various upstream tasks -- one of which is bug prediction. We measured this approach's validity by itself and its effectiveness compared to bug prediction based solely on code metrics. We also experimented on numerous machine learning approaches to check the connection between different embedding parameters with different machine learning models. Our results show that this representation provides meaningful information as it improves the bug prediction accuracy in most cases, and is always at least as good as only using code metrics as features.

CRMay 16, 2021
Improving Vulnerability Prediction of JavaScript Functions Using Process Metrics

Tamás Viszkok, Péter Hegedűs, Rudolf Ferenc

Due to the growing number of cyber attacks against computer systems, we need to pay special attention to the security of our software systems. In order to maximize the effectiveness, excluding the human component from this process would be a huge breakthrough. The first step towards this is to automatically recognize the vulnerable parts in our code. Researchers put a lot of effort into creating machine learning models that could determine if a given piece of code, or to be more precise, a selected function, contains any vulnerabilities or not. We aim at improving the existing models, building on previous results in predicting vulnerabilities at the level of functions in JavaScript code using the well-known static source code metrics. In this work, we propose to include several so-called process metrics (e.g., code churn, number of developers modifying a file, or the age of the changed source code) into the set of features, and examine how they affect the performance of the function-level JavaScript vulnerability prediction models. We can confirm that process metrics significantly improve the prediction power of such models. On average, we observed a 8.4% improvement in terms of F-measure (from 0.764 to 0.848), 3.5% improvement in terms of precision (from 0.953 to 0.988) and a 6.3% improvement in terms of recall (from 0.697 to 0.760).

SENov 2, 2020
Employing Partial Least Squares Regression with Discriminant Analysis for Bug Prediction

Rudolf Ferenc, István Siket, Péter Hegedűs et al.

Forecasting defect proneness of source code has long been a major research concern. Having an estimation of those parts of a software system that most likely contain bugs may help focus testing efforts, reduce costs, and improve product quality. Many prediction models and approaches have been introduced during the past decades that try to forecast bugged code elements based on static source code metrics, change and history metrics, or both. However, there is still no universal best solution to this problem, as most suitable features and models vary from dataset to dataset and depend on the context in which we use them. Therefore, novel approaches and further studies on this topic are highly necessary. In this paper, we employ a chemometric approach - Partial Least Squares with Discriminant Analysis (PLS-DA) - for predicting bug prone Classes in Java programs using static source code metrics. To our best knowledge, PLS-DA has never been used before as a statistical approach in the software maintenance domain for predicting software errors. In addition, we have used rigorous statistical treatments including bootstrap resampling and randomization (permutation) test, and evaluation for representing the software engineering results. We show that our PLS-DA based prediction model achieves superior performances compared to the state-of-the-art approaches (i.e. F-measure of 0.44-0.47 at 90% confidence level) when no data re-sampling applied and comparable to others when applying up-sampling on the largest open bug dataset, while training the model is significantly faster, thus finding optimal parameters is much easier. In terms of completeness, which measures the amount of bugs contained in the Java Classes predicted to be defective, PLS-DA outperforms every other algorithm: it found 69.3% and 79.4% of the total bugs with no re-sampling and up-sampling, respectively.

SEJun 17, 2020
An Automatically Created Novel Bug Dataset and its Validation in Bug Prediction

Rudolf Ferenc, Péter Gyimesi, Gábor Gyimesi et al.

Bugs are inescapable during software development due to frequent code changes, tight deadlines, etc.; therefore, it is important to have tools to find these errors. One way of performing bug identification is to analyze the characteristics of buggy source code elements from the past and predict the present ones based on the same characteristics, using e.g. machine learning models. To support model building tasks, code elements and their characteristics are collected in so-called bug datasets which serve as the input for learning. We present the \emph{BugHunter Dataset}: a novel kind of automatically constructed and freely available bug dataset containing code elements (files, classes, methods) with a wide set of code metrics and bug information. Other available bug datasets follow the traditional approach of gathering the characteristics of all source code elements (buggy and non-buggy) at only one or more pre-selected release versions of the code. Our approach, on the other hand, captures the buggy and the fixed states of the same source code elements from the narrowest timeframe we can identify for a bug's presence, regardless of release versions. To show the usefulness of the new dataset, we built and evaluated bug prediction models and achieved F-measure values over 0.74.