SEAug 21, 2023Code
Exploring Parameter-Efficient Fine-Tuning Techniques for Code Generation with Large Language ModelsMartin Weyssow, Xin Zhou, Kisub Kim et al.
Large language models (LLMs) demonstrate impressive capabilities to generate accurate code snippets given natural language intents in a zero-shot manner, i.e., without the need for specific fine-tuning. While prior studies have highlighted the advantages of fine-tuning LLMs, this process incurs high computational costs, making it impractical in resource-scarce environments, particularly for models with billions of parameters. To address these challenges, previous research explored in-context learning (ICL) and retrieval-augmented generation (RAG) as strategies to guide the LLM generative process with task-specific prompt examples. However, ICL and RAG introduce inconveniences, such as the need for designing contextually relevant prompts and the absence of learning task-specific parameters, thereby limiting downstream task performance. In this context, we foresee parameter-efficient fine-tuning (PEFT) as a promising approach to efficiently specialize LLMs to task-specific data while maintaining reasonable resource consumption. In this paper, we deliver a comprehensive study of PEFT techniques for LLMs in the context of automated code generation. Our comprehensive investigation of PEFT techniques for LLMs reveals their superiority and potential over ICL and RAG across a diverse set of LLMs and three representative Python code generation datasets: Conala, CodeAlpacaPy, and APPS. Furthermore, our study highlights the potential for tuning larger LLMs and significant reductions in memory usage by combining PEFT with quantization. Therefore, this study opens opportunities for broader applications of PEFT in software engineering scenarios. Our code is available at https://github.com/martin-wey/peft-llm-code/.
SEDec 7, 2022
Towards using Few-Shot Prompt Learning for Automating Model CompletionMeriem Ben Chaaben, Lola Burgueño, Houari Sahraoui
We propose a simple yet a novel approach to improve completion in domain modeling activities. Our approach exploits the power of large language models by using few-shot prompt learning without the need to train or fine-tune those models with large datasets that are scarce in this field. We implemented our approach and tested it on the completion of static and dynamic domain diagrams. Our initial evaluation shows that such an approach is effective and can be integrated in different ways during the modeling activities.
CLJun 23, 2022
AST-Probe: Recovering abstract syntax trees from hidden representations of pre-trained language modelsJosé Antonio Hernández López, Martin Weyssow, Jesús Sánchez Cuadrado et al.
The objective of pre-trained language models is to learn contextual representations of textual data. Pre-trained language models have become mainstream in natural language processing and code modeling. Using probes, a technique to study the linguistic properties of hidden vector spaces, previous works have shown that these pre-trained language models encode simple linguistic properties in their hidden representations. However, none of the previous work assessed whether these models encode the whole grammatical structure of a programming language. In this paper, we prove the existence of a syntactic subspace, lying in the hidden representations of pre-trained language models, which contain the syntactic information of the programming language. We show that this subspace can be extracted from the models' representations and define a novel probing method, the AST-Probe, that enables recovering the whole abstract syntax tree (AST) of an input code snippet. In our experimentations, we show that this syntactic subspace exists in five state-of-the-art pre-trained language models. In addition, we highlight that the middle layers of the models are the ones that encode most of the AST information. Finally, we estimate the optimal size of this syntactic subspace and show that its dimension is substantially lower than those of the models' representation spaces. This suggests that pre-trained language models use a small part of their representation spaces to encode syntactic information of the programming languages.
19.8SEApr 9
Modeling Sampling Workflows for Code RepositoriesRomain Lefeuvre, Maïwenn Le Goasteller, Jessie Galasso et al.
Empirical software engineering research often depends on datasets of code repository artifacts, where sampling strategies are employed to enable large-scale analyses. The design and evaluation of these strategies are critical, as they directly influence the generalizability of research findings. However, sampling remains an underestimated aspect in software engineering research: we identify two main challenges related to (1) the design and representativeness of sampling approaches, and (2) the ability to reason about the implications of sampling decisions on generalizability. To address these challenges, we propose a Domain-Specific Language (DSL) to explicitly describe complex sampling strategies through composable sampling operators. This formalism supports both the specification and the reasoning about the generalizability of results based on the applied sampling strategies. We implement the DSL as a Python-based fluent API, and demonstrate how it facilitates representativeness reasoning using statistical indicators extracted from sampling workflows. We validate our approach through a case study of MSR papers involving code repository sampling. Our results show that the DSL can model the sampling strategies reported in recent literature.
SEDec 20, 2023Code
CodeLL: A Lifelong Learning Dataset to Support the Co-Evolution of Data and Language Models of CodeMartin Weyssow, Claudio Di Sipio, Davide Di Ruscio et al.
Motivated by recent work on lifelong learning applications for language models (LMs) of code, we introduce CodeLL, a lifelong learning dataset focused on code changes. Our contribution addresses a notable research gap marked by the absence of a long-term temporal dimension in existing code change datasets, limiting their suitability in lifelong learning scenarios. In contrast, our dataset aims to comprehensively capture code changes across the entire release history of open-source software repositories. In this work, we introduce an initial version of CodeLL, comprising 71 machine-learning-based projects mined from Software Heritage. This dataset enables the extraction and in-depth analysis of code changes spanning 2,483 releases at both the method and API levels. CodeLL enables researchers studying the behaviour of LMs in lifelong fine-tuning settings for learning code changes. Additionally, the dataset can help studying data distribution shifts within software repositories and the evolution of API usages over time.
SEDec 6, 2016Code
Automated Inference of Software Library Usage PatternsMohamed Aymen Saied, Ali Ouni, Houari Sahraoui et al.
Modern software systems are increasingly dependent on third-party libraries. It is widely recognized that using mature and well-tested third-party libraries can improve developers' productivity, reduce time-to-market, and produce more reliable software. Today's open-source repositories provide a wide range of libraries that can be freely downloaded and used. However, as software libraries are documented separately but intended to be used together, developers are unlikely to fully take advantage of these reuse opportunities. In this paper, we present a novel approach to automatically identify third-party library usage patterns, i.e., collections of libraries that are commonly used together by developers. Our approach employs hierarchical clustering technique to group together software libraries based on external client usage. To evaluate our approach, we mined a large set of over 6,000 popular libraries from Maven Central Repository and investigated their usage by over 38,000 client systems from the Github repository. Our experiments show that our technique is able to detect the majority (77%) of highly consistent and cohesive library usage patterns across a considerable number of client systems.
SEJun 1, 2016Code
Recovering Architectural Variability of a Family of Product VariantsAnas Shatnawi, Abdelhak Seriai, Houari Sahraoui
A Software Product Line (SPL) aims at applying a pre-planned systematic reuse of large-grained software artifacts to increase the software productivity and reduce the development cost. The idea of SPL is to analyze the business domain of a family of products to identify the common and the variable parts between the products. However, it is common for companies to develop, in an ad-hoc manner (e.g. clone and own), a set of products that share common functionalities and differ in terms of others. Thus, many recent research contributions are proposed to re-engineer existing product variants to a SPL. Nevertheless, these contributions are mostly focused on managing the variability at the requirement level. Very few contributions address the variability at the architectural level despite its major importance. Starting from this observation, we propose, in this paper, an approach to reverse engineer the architecture of a set of product variants. Our goal is to identify the variability and dependencies among architectural-element variants at the architectural level. Our work relies on Formal Concept Analysis (FCA) to analyze the variability. To validate the proposed approach, we experimented on two families of open-source product variants; Mobile Media and Health Watcher. The results show that our approach is able to identify the architectural variability and the dependencies.
SEFeb 10, 2025
Combining Large Language Models with Static Analyzers for Code Review GenerationImen Jaoua, Oussama Ben Sghaier, Houari Sahraoui
Code review is a crucial but often complex, subjective, and time-consuming activity in software development. Over the past decades, significant efforts have been made to automate this process. Early approaches focused on knowledge-based systems (KBS) that apply rule-based mechanisms to detect code issues, providing precise feedback but struggling with complex, context-dependent cases. More recent work has shifted toward fine-tuning pre-trained language models for code review, enabling broader issue coverage but often at the expense of precision. In this paper, we propose a hybrid approach that combines the strengths of KBS and learning-based systems (LBS) to generate high-quality, comprehensive code reviews. Our method integrates knowledge at three distinct stages of the language model pipeline: during data preparation (Data-Augmented Training, DAT), at inference (Retrieval-Augmented Generation, RAG), and after inference (Naive Concatenation of Outputs, NCO). We empirically evaluate our combination strategies against standalone KBS and LBS fine-tuned on a real-world dataset. Our results show that these hybrid strategies enhance the relevance, completeness, and overall quality of review comments, effectively bridging the gap between rule-based tools and deep learning models.
16.2LGApr 22
Generative Flow Networks for Model Adaptation in Digital Twins of Natural SystemsPascal Archambault, Houari Sahraoui, Eugene Syriani
Digital twins of natural systems must remain aligned with physical systems that evolve over time, are only partially observed, and are typically modeled by mechanistic simulators whose parameters cannot be measured directly. In such settings, model adaptation is naturally posed as a simulation-based inference problem. However, sparse and indirect observations often fail to identify a unique and optimal calibration, leaving several simulator parameterizations compatible with the available evidence. This article presents a GFlowNet-based approach to model adaptation for digital twins of natural systems. We formulate adaptation as a generative modeling problem over complete simulator configurations, so that plausible parameterizations can be sampled with probability proportional to a reward derived from agreement between simulated and observed behavior. Using a controlled environment agriculture case study based on a mechanistic tomato model, we show that the learned policy recovers dominant regions of the adaptation landscape, retrieves strong calibration hypotheses, and preserves multiple plausible configurations under uncertainty.
SEOct 16, 2024
On the Utility of Domain Modeling Assistance with Large Language ModelsMeriem Ben Chaaben, Lola Burgueño, Istvan David et al.
Model-driven engineering (MDE) simplifies software development through abstraction, yet challenges such as time constraints, incomplete domain understanding, and adherence to syntactic constraints hinder the design process. This paper presents a study to evaluate the usefulness of a novel approach utilizing large language models (LLMs) and few-shot prompt learning to assist in domain modeling. The aim of this approach is to overcome the need for extensive training of AI-based completion models on scarce domain-specific datasets and to offer versatile support for various modeling activities, providing valuable recommendations to software modelers. To support this approach, we developed MAGDA, a user-friendly tool, through which we conduct a user study and assess the real-world applicability of our approach in the context of domain modeling, offering valuable insights into its usability and effectiveness.
7.7IRApr 10
From Intent to AI Pipelines: A Controlled Agentic Framework for Non-AI Expert ScientistsHyacinth Ali, Jessie Galasso-Carbonnel, Houari Sahraoui
Artificial Intelligence (AI) pipelines have become integral to modern research, supporting fields such as Medical Sciences, Agriculture, and Social Sciences, and enabling large-scale data analysis, predictive modeling, and the automation of complex tasks. However, designing and implementing AI solutions remains challenging for many researchers due to the expertise required in the design and development of end-to-end AI systems. To address this gap, we present Domain-Driven Adaptable AI Pipelines (DDAP), a controlled, human-in-the-loop, agentic framework that leverages large language models to guide users in a systematic construction of AI pipelines and their corresponding implementation code. DDAP structures the development process into four stages: problem definition, compute environment specification, pipeline generation, and code generation. Through this staged interaction, the framework adapts to domain context, user expertise, and resource constraints, while maintaining user control over key decisions. We evaluate DDAP across multiple datasets spanning business, biology, and health science domains by comparing its AI models against expert-developed models. The experimental results show that DDAP achieves competitive results in several tasks compared to expert baselines, although performance varies across problem types, particularly for text-based clustering tasks. By combining guided interaction, adaptability, and reproducibility, DDAP demonstrates that a controlled agentic framework can generate competitive AI pipelines for non-expert users.
SEJun 4, 2025
Leveraging Reward Models for Guiding Code Review Comment GenerationOussama Ben Sghaier, Rosalia Tufano, Gabriele Bavota et al.
Code review is a crucial component of modern software development, involving the evaluation of code quality, providing feedback on potential issues, and refining the code to address identified problems. Despite these benefits, code review can be rather time consuming, and influenced by subjectivity and human factors. For these reasons, techniques to (partially) automate the code review process have been proposed in the literature. Among those, the ones exploiting deep learning (DL) are able to tackle the generative aspect of code review, by commenting on a given code as a human reviewer would do (i.e., comment generation task) or by automatically implementing code changes required to address a reviewer's comment (i.e., code refinement task). In this paper, we introduce CoRAL, a deep learning framework automating review comment generation by exploiting reinforcement learning with a reward mechanism considering both the semantics of the generated comments as well as their usefulness as input for other models automating the code refinement task. The core idea is that if the DL model generates comments that are semantically similar to the expected ones or can be successfully implemented by a second model specialized in code refinement, these comments are likely to be meaningful and useful, thus deserving a high reward in the reinforcement learning framework. We present both quantitative and qualitative comparisons between the comments generated by CoRAL and those produced by the latest baseline techniques, highlighting the effectiveness and superiority of our approach.
SEMar 27, 2025
MONO2REST: Identifying and Exposing Microservices: a Reusable RESTification ApproachMatthéo Lecrivain, Hanifa Barry, Dalila Tamzalit et al.
The microservices architectural style has become the de facto standard for large-scale cloud applications, offering numerous benefits in scalability, maintainability, and deployment flexibility. Many organizations are pursuing the migration of legacy monolithic systems to a microservices architecture. However, this process is challenging, risky, time-intensive, and prone-to-failure while several organizations lack necessary financial resources, time, or expertise to set up this migration process. So, rather than trying to migrate a legacy system where migration is risky or not feasible, we suggest exposing it as a microservice application without without having to migrate it. In this paper, we present a reusable, automated, two-phase approach that combines evolutionary algorithms with machine learning techniques. In the first phase, we identify microservices at the method level using a multi-objective genetic algorithm that considers both structural and semantic dependencies between methods. In the second phase, we generate REST APIs for each identified microservice using a classification algorithm to assign HTTP methods and endpoints. We evaluated our approach with a case study on the Spring PetClinic application, which has both monolithic and microservices implementations that serve as ground truth for comparison. Results demonstrate that our approach successfully aligns identified microservices with those in the reference microservices implementation, highlighting its effectiveness in service identification and API generation.
SEMar 14, 2024
CodeUltraFeedback: An LLM-as-a-Judge Dataset for Aligning Large Language Models to Coding PreferencesMartin Weyssow, Aton Kamanda, Xin Zhou et al.
Evaluating the alignment of large language models (LLMs) with user-defined coding preferences is a challenging endeavour that requires a deep assessment of LLMs' outputs. Existing methods and benchmarks rely primarily on automated metrics and static analysis tools, which often fail to capture the nuances of user instructions and LLM outputs. To address this gap, we propose using the LLM-as-a-Judge methodology to evaluate the alignment of LLMs with coding preferences. Based on this approach, we present CodeUltraFeedback, a comprehensive dataset designed to facilitate the evaluation and improvement of LLM alignment. CodeUltraFeedback consists of 10,000 coding instructions, each annotated with four responses generated from a diverse pool of 14 LLMs. These responses are ranked based on five distinct coding preferences using GPT-3.5 as a judge, providing both numerical scores and detailed textual feedback. Our analysis of CodeUltraFeedback reveals that responses from GPT-3.5 and GPT-4 are generally preferred over those from open-weight LLMs, highlighting significant differences in alignment between closed and open-weight models. In turn, we explore the usage of CodeUltraFeedback as feedback data to fine-tune and align CodeLlama-7B-Instruct using supervised fine-tuning (SFT) and reinforcement learning from AI feedback (RLAIF) with direct preference optimization (DPO). The resulting aligned CodeLlama-7B-Instruct model outperforms larger LLMs in terms of alignment with coding preferences and shows improved functional correctness on the HumanEval+ benchmark compared to the original instruct model. Therefore, our contributions bridge the gap in preference tuning of LLMs for code and set the stage for further advancements in model alignment and RLAIF in automated software engineering.
SEMay 6, 2023
On the Usage of Continual Learning for Out-of-Distribution Generalization in Pre-trained Language Models of CodeMartin Weyssow, Xin Zhou, Kisub Kim et al.
Pre-trained language models (PLMs) have become a prevalent technique in deep learning for code, utilizing a two-stage pre-training and fine-tuning procedure to acquire general knowledge about code and specialize in a variety of downstream tasks. However, the dynamic nature of software codebases poses a challenge to the effectiveness and robustness of PLMs. In particular, world-realistic scenarios potentially lead to significant differences between the distribution of the pre-training and test data, i.e., distribution shift, resulting in a degradation of the PLM's performance on downstream tasks. In this paper, we stress the need for adapting PLMs of code to software data whose distribution changes over time, a crucial problem that has been overlooked in previous works. The motivation of this work is to consider the PLM in a non-stationary environment, where fine-tuning data evolves over time according to a software evolution scenario. Specifically, we design a scenario where the model needs to learn from a stream of programs containing new, unseen APIs over time. We study two widely used PLM architectures, i.e., a GPT2 decoder and a RoBERTa encoder, on two downstream tasks, API call and API usage prediction. We demonstrate that the most commonly used fine-tuning technique from prior work is not robust enough to handle the dynamic nature of APIs, leading to the loss of previously acquired knowledge i.e., catastrophic forgetting. To address these issues, we implement five continual learning approaches, including replay-based and regularization-based methods. Our findings demonstrate that utilizing these straightforward methods effectively mitigates catastrophic forgetting in PLMs across both downstream tasks while achieving comparable or superior performance.
SEJan 19, 2022
Code Sophistication: From Code Recommendation to Logic RecommendationJessie Galasso, Michalis Famelis, Houari Sahraoui
A typical approach to programming is to first code the main execution scenario, and then focus on filling out alternative behaviors and corner cases. But, almost always, there exist unusual conditions that trigger atypical behaviors, which are hard to predict in program specifications, and are thus often not coded. In this paper, we consider the problem of detecting and recommending such missing behaviors, a task that we call code sophistication. Previous research on coding assistants usually focuses on recommending code fragments based on specifications of the intended behavior. In contrast, code sophistication happens in the absence of a specification, aiming to help developers complete the logic of their programs with missing and unspecified behaviors. We outline the research challenges to this problem and present early results showing how program logic can be completed by leveraging code structure and information about the usage of input parameters.
SEJan 10, 2022
Better Modeling the Programming World with Code Concept Graphs-augmented Multi-modal LearningMartin Weyssow, Houari Sahraoui, Bang Liu
The progress made in code modeling has been tremendous in recent years thanks to the design of natural language processing learning approaches based on state-of-the-art model architectures. Nevertheless, we believe that the current state-of-the-art does not focus enough on the full potential that data may bring to a learning process in software engineering. Our vision articulates on the idea of leveraging multi-modal learning approaches to modeling the programming world. In this paper, we investigate one of the underlying idea of our vision whose objective based on concept graphs of identifiers aims at leveraging high-level relationships between domain concepts manipulated through particular language constructs. In particular, we propose to enhance an existing pretrained language model of code by joint-learning it with a graph neural network based on our concept graphs. We conducted a preliminary evaluation that shows gain of effectiveness of the models for code search using a simple joint-learning method and prompts us to further investigate our research vision.
SEApr 4, 2021
Recommending Metamodel Concepts during Modeling Activities with Pre-Trained Language ModelsMartin Weyssow, Houari Sahraoui, Eugene Syriani
The design of conceptually sound metamodels that embody proper semantics in relation to the application domain is particularly tedious in Model-Driven Engineering. As metamodels define complex relationships between domain concepts, it is crucial for a modeler to define these concepts thoroughly while being consistent with respect to the application domain. We propose an approach to assist a modeler in the design of a metamodel by recommending relevant domain concepts in several modeling scenarios. Our approach does not require to extract knowledge from the domain or to hand-design completion rules. Instead, we design a fully data-driven approach using a deep learning model that is able to abstract domain concepts by learning from both structural and lexical metamodel properties in a corpus of thousands of independent metamodels. We evaluate our approach on a test set containing 166 metamodels, unseen during the model training, with more than 5000 test samples. Our preliminary results show that the trained model is able to provide accurate top-$5$ lists of relevant recommendations for concept renaming scenarios. Although promising, the results are less compelling for the scenario of the iterative construction of the metamodel, in part because of the conservative strategy we use to evaluate the recommendations.
SEDec 28, 2020
API Misuse Detection An Immune System inspired ApproachMaxime Gallais-Jimenez, Hoan A. Nguyen, Mohamed Aymen Saied et al.
APIs are essential ingredients for developing complex software systems. However, they are difficult to learn and to use. Thus, developers may misuse them, which results in various types of issues. In this paper, we explore the use of a bio-inspired approach (artificial immune system) to detect API misuses in client code. We built APIMMUNE, a novel API misuse detector. We collect normal usages of a given APIs from the set of client programs using the APIs, especially after some API usages were fixed in those programs. The normal API usages are considered as normal body cells. We transform them into normal-usage signatures. Then, artificial detectors are randomly generated by generating artificial deviations from these usages with the objective of being different from the normal usage signatures. The generated detectors have the ability to detect risky uses of APIs exactly as the immune system detects foreign cells of the organism. Moreover, for the detection purpose, only the artificial detectors are necessary, without the need to disclose the code used to generate them. Our approach was evaluated on the misuses dataset of three APIs as well as on known misuses from a state of the art APIs misuses benchmarking dataset. APIMMUNE was also compared to four state-of-the-art API misuse detection tools. The results show that APIMMUNE has good detection accuracy and performance, and it can complement pattern-based tools for uncommon misuses detection.
SEDec 14, 2020
Fixing Multiple Type Errors in Model Transformations with Alternative Oracles to Test CasesZahra VaraminyBahnemiry, Jessie Galasso, Houari Sahraoui
This paper addresses the issue of correcting type errors in model transformations in realistic scenarios where neither predefined patches nor behavior-safe guards such as test suites are available. Instead of using predefined patches targeting isolated errors of specific categories, we propose to explore the space of possible patches by combining basic edit operations for model transformation programs. To guide the search, we define two families of objectives: one to limit the number of type errors and the other to preserve the transformation behavior. To approximate the latter, we study two objectives: minimizing the number of changes and keeping the changes local. Additionally, we define four heuristics to refine candidate patches to increase the likelihood of correcting type errors while preserving the transformation behavior. We implemented our approach for the ATL language using the evolutionary algorithm NSGA-II, and performed an evaluation based on three published case studies. The evaluation results show that our approach was able to automatically correct on average more than82% of type errors for two cases and more than 56% for the third case.
SEOct 9, 2020
A Generic Approach to Detect Design Patterns in Model Transformations Using a String-Matching AlgorithmChihab eddine Mokaddem, Houari Sahraoui, Eugene Syriani
Maintaining software artifacts is among the hardest tasks an engineer faces. Like any other piece of code, model transformations developed by engineers are also subject to maintenance. To facilitate the comprehension of programs, software engineers rely on many techniques, such as design pattern detection. Therefore, detecting design patterns in model transformation implementations is of tremendous value for developers. In this paper, we propose a generic technique to detect design patterns and their variations in model transformation implementations automatically. It takes as input a set of model transformation rules and the participants of a model transformation design pattern to find occurrences of the latter in the former. The technique also detects certain kinds of degenerate forms of the pattern, thus indicating potential opportunities to improve the model transformation implementation.
SEMar 16, 2018
Identifying Components from Object-Oriented APIs Based on Dynamic AnalysisAnas Shatnawi, Hudhaifa Shatnawi, Mohamed Aymen Saied et al.
The reuse at the component level is generally more effective than the one at the object-oriented class level. This is due to the granularity level where components expose their functionalities at an abstract level compared to the fine-grained object-oriented classes. Moreover, components clearly define their dependencies through their provided and required interfaces in an explicit way that facilitates the understanding of how to reuse these components. Therefore, several component identification approaches have been proposed to identify components based on the analysis object-oriented software applications. Nevertheless, most of the existing component identification approaches did not consider co-usage dependencies between API classes to identify classes/methods that can be reused to implement a specific scenario. In this paper, we propose an approach to identify reusable software components in object-oriented APIs, based on the interactions between client applications and the targeted API. As we are dealing with actual clients using the API, dynamic analysis allows to better capture the instances of API usage. Approaches using static analysis are usually limited by the difficulty of handling dynamic features such as polymorphism and class loading. We evaluate our approach by applying it to three Java APIs with eight client applications from the DaCapo benchmark. DaCapo provides a set of pre-defined usage scenarios. The results show that our component identification approach has a very high precision.
SEMar 18, 2017
Systematic Mapping Study of Template-based Code GenerationEugene Syriani, Lechanceux Luhunu, Houari Sahraoui
Template-based code generation (TBCG) is a synthesis technique that produces code from high-level specifications, called templates. TBCG is a popular technique in model-driven engineering (MDE) given that they both emphasize abstraction and automation. Given the diversity of tools and approaches, it is necessary to classify existing TBCG techniques to better guide developers in their choices. The goal of this article is to better understand the characteristics of TBCG techniques and associated tools, identify research trends, and assess the importance of the role of MDE in this code synthesis approach. We conducted a systematic mapping study of the literature to paint an interesting picture about the trends and uses of TBCG. Our study shows that the community has been diversely using TBCG over the past 15 years. TBCG has greatly benefited from MDE. It has favored a template style that is output-based and high level modeling languages as input. TBCG is mainly used to generate source code and has been applied in a variety of domains. Furthermore, both MDE and non-MDE tools are becoming effective development resources in industry.
SEJun 2, 2016
Mining Software Components from Object-Oriented APIsAnas Shatnawi, Abdelhak Seriai, Houari Sahraoui et al.
Object-oriented Application Programing Interfaces (APIs) support software reuse by providing pre-implemented functionalities. Due to the huge number of included classes, reusing and understanding large APIs is a complex task. Otherwise, software components are admitted to be more reusable and understandable entities than object-oriented ones. Thus, in this paper, we propose an approach for reengineering object-oriented APIs into component-based ones. We mine components as a group of classes based on the frequency they are used together and their ability to form a quality-centric component. To validate our approach, we experimented on 100 Java applications that used Android APIs.
SEMar 17, 2015
Modeling and Analyzing Release Trajectory based on the Process of Issue TrackingHani Abdeen, Houari Sahraoui
Software release development process, that we refer to as "release trajectory", involves development activities that are usually sorted in different categories, such as incorporating new features, improving software, or fixing bugs, and associated to "issues". Release trajectory management is a difficult and crucial task. Managers must be aware of every aspect of the development process for managing the software-related issues. Issue Tracking Systems (ITS) play a central role in supporting the management of release trajectory. These systems, which support reporting and tracking issues of different kinds (such as "bug", "feature", "improvement", etc.), record rich data about the software development process. Yet, recorded historical data in ITS are still not well-modeled for supporting practical needs of release trajectory management. In this paper, we describe a sequence analysis approach for modeling and analyzing releases' trajectories, using the tracking process of reported issues. Release trajectory analysis is based on the categories of tracked issues and their temporal changing, and aims to address important questions regarding the co-habitation of unresolved issues, the transitions between different statuses in release trajectory, the recurrent patterns of release trajectories, and the properties of a release trajectory.