Önder Babur

h-index11

4papers

688citations

Novelty40%

AI Score20

Ranked #184,323 of 194,257 authors (top 95%)#2,407 in SE (top 79%)

4 Papers

4.8SEJul 7

Domain-Driven Design in Practice: A Large-Scale Empirical Characterisation of the Open-Source Ecosystem

Ozan Özkan, Önder Babur, Mark van den Brand

Context: Domain-Driven Design (DDD) is a leading paradigm for managing software complexity, yet research remains largely theoretical; our prior work found nearly 39% of DDD studies lack rigorous empirical evaluation, leaving practical adoption largely unexamined at scale. Objective: We provide the first large-scale characterisation of the DDD landscape on GitHub, a data-driven baseline for how the paradigm is implemented and sustained in practice. Method: Using a Mining Software Repositories (MSR) approach with a hybrid strategy (topics and README keywords), we identified 11,742 candidate repositories. To address label noise, we built a novel semantic validation pipeline using GPT-4o with a triplicate majority-vote strategy, yielding 2,502 verified repositories. Validation against a manually labelled sample showed substantial agreement with human experts (kappa = 0.77). Results: DDD adoption accelerated sharply after a 2017 inflection point, and the resulting projects are notably long-lived: their median lifespan exceeds the typical GitHub project by over an order of magnitude, indicating sustained, professional-grade engineering rather than short-lived experiments. Layered and Clean Architecture dominate, while CQRS and Event Sourcing recur in distributed, data-intensive systems. Notably, the data challenge the Java-centric assumption of much academic work: C# and TypeScript, not Java, lead practical adoption. Conclusions: DDD has matured into a stable, professional-grade practice adopted across diverse languages and domains. However, a quarter of projects (25.3%) record no explicit business context, revealing a persistent gap between how domain intent is designed and how it is preserved in version control. We call for lightweight architectural traceability standards and offer guidance for teams reusing these repositories as reference implementations.

8.6SEJun 6, 2021

Clone-Seeker: Effective Code Clone Search Using Annotations

Muhammad Hammad, Önder Babur, Hamid Abdul Basit et al.

Source code search plays an important role in software development, e.g. for exploratory development or opportunistic reuse of existing code from a code base. Often, exploration of different implementations with the same functionality is needed for tasks like automated software transplantation, software diversification, and software repair. Code clones, which are syntactically or semantically similar code fragments, are perfect candidates for such tasks. Searching for code clones involves a given search query to retrieve the relevant code fragments. We propose a novel approach called Clone-Seeker that focuses on utilizing clone class features in retrieving code clones. For this purpose, we generate metadata for each code clone in the form of a natural language document. The metadata includes a pre-processed list of identifiers from the code clones augmented with a list of keywords indicating the semantics of the code clone. This keyword list can be extracted from a manually annotated general description of the clone class, or automatically generated from the source code of the entire clone class. This approach helps developers to perform code clone search based on a search query written either as source code terms, or as natural language. In our quantitative evaluation, we show that (1) Clone-Seeker has a higher recall when searching for semantic code clones (i.e., Type-4) in BigCloneBench than the state-of-the-art; and (2) Clone-Seeker can accurately search for relevant code clones by applying natural language queries.

7.3SEOct 2, 2020

Augmenting Machine Learning with Information Retrieval to Recommend Real Cloned Code Methods for Code Completion

Muhammad Hammad, Önder Babur, Hamid Abdul Basit

Software developers frequently reuse source code from repositories as it saves development time and effort. Code clones accumulated in these repositories hence represent often repeated functionalities and are candidates for reuse in an exploratory or rapid development. In previous work, we introduced DeepClone, a deep neural network model trained by fine tuning GPT-2 model over the BigCloneBench dataset to predict code clone methods. The probabilistic nature of DeepClone output generation can lead to syntax and logic errors that requires manual editing of the output for final reuse. In this paper, we propose a novel approach of applying an information retrieval (IR) technique on top of DeepClone output to recommend real clone methods closely matching the predicted output. We have quantitatively evaluated our strategy, showing that the proposed approach significantly improves the quality of recommendation.

12.8SEJul 22, 2020

DeepClone: Modeling Clones to Generate Code Predictions

Muhammad Hammad, Önder Babur, Hamid Abdul Basit et al.

Programmers often reuse code from source code repositories to reduce the development effort. Code clones are candidates for reuse in exploratory or rapid development, as they represent often repeated functionality in software systems. To facilitate code clone reuse, we propose DeepClone, a novel approach utilizing a deep learning algorithm for modeling code clones to predict the next set of tokens (possibly a complete clone method body) based on the code written so far. The predicted tokens require minimal customization to fit the context. DeepClone applies natural language processing techniques to learn from a large code corpus, and generates code tokens using the model learned. We have quantitatively evaluated our solution to assess (1) our model's quality and its accuracy in token prediction, and (2) its performance and effectiveness in clone method prediction. We also discuss various application scenarios for our approach.