Davide Di Ruscio

SE
h-index43
11papers
183citations
Novelty27%
AI Score24

11 Papers

CRApr 1, 2022
Leveraging Privacy Profiles to Empower Users in the Digital Society

Davide Di Ruscio, Paola Inverardi, Patrizio Migliarini et al.

Privacy and ethics of citizens are at the core of the concerns raised by our increasingly digital society. Profiling users is standard practice for software applications triggering the need for users, also enforced by laws, to properly manage privacy settings. Users need to manage software privacy settings properly to protect personally identifiable information and express personal ethical preferences. AI technologies that empower users to interact with the digital world by reflecting their personal ethical preferences can be key enablers of a trustworthy digital society. We focus on the privacy dimension and contribute a step in the above direction through an empirical study on an existing dataset collected from the fitness domain. We find out which set of questions is appropriate to differentiate users according to their preferences. The results reveal that a compact set of semantic-driven questions (about domain-independent privacy preferences) helps distinguish users better than a complex domain-dependent one. This confirms the study's hypothesis that moral attitudes are the relevant piece of information to collect. Based on the outcome, we implement a recommender system to provide users with suitable recommendations related to privacy choices. We then show that the proposed recommender system provides relevant settings to users, obtaining high accuracy.

SEMay 19, 2022
GitRanking: A Ranking of GitHub Topics for Software Classification using Active Sampling

Cezar Sas, Andrea Capiluppi, Claudio Di Sipio et al.

GitHub is the world's largest host of source code, with more than 150M repositories. However, most of these repositories are not labeled or inadequately so, making it harder for users to find relevant projects. There have been various proposals for software application domain classification over the past years. However, these approaches lack a well-defined taxonomy that is hierarchical, grounded in a knowledge base, and free of irrelevant terms. This work proposes GitRanking, a framework for creating a classification ranked into discrete levels based on how general or specific their meaning is. We collected 121K topics from GitHub and considered $60\%$ of the most frequent ones for the ranking. GitRanking 1) uses active sampling to ensure a minimal number of required annotations; and 2) links each topic to Wikidata, reducing ambiguities and improving the reusability of the taxonomy. Our results show that developers, when annotating their projects, avoid using terms with a high degree of specificity. This makes the finding and discovery of their projects more challenging for other users. Furthermore, we show that GitRanking can effectively rank terms according to their general or specific meaning. This ranking would be an essential asset for developers to build upon, allowing them to complement their annotations with more precise topics. Finally, we show that GitRanking is a dynamically extensible method: it can currently accept further terms to be ranked with a minimum number of annotations ($\sim$ 15). This paper is the first collective attempt to build a ground-up taxonomy of software domains.

SEDec 20, 2023Code
CodeLL: A Lifelong Learning Dataset to Support the Co-Evolution of Data and Language Models of Code

Martin Weyssow, Claudio Di Sipio, Davide Di Ruscio et al.

Motivated by recent work on lifelong learning applications for language models (LMs) of code, we introduce CodeLL, a lifelong learning dataset focused on code changes. Our contribution addresses a notable research gap marked by the absence of a long-term temporal dimension in existing code change datasets, limiting their suitability in lifelong learning scenarios. In contrast, our dataset aims to comprehensively capture code changes across the entire release history of open-source software repositories. In this work, we introduce an initial version of CodeLL, comprising 71 machine-learning-based projects mined from Software Heritage. This dataset enables the extraction and in-depth analysis of code changes spanning 2,483 releases at both the method and API levels. CodeLL enables researchers studying the behaviour of LMs in lifelong fine-tuning settings for learning code changes. Additionally, the dataset can help studying data distribution shifts within software repositories and the evolution of API usages over time.

SEMar 11, 2021Code
Development of recommendation systems for software engineering: the CROSSMINER experience

Juri Di Rocco, Davide Di Ruscio, Claudio Di Sipio et al.

To perform their daily tasks, developers intensively make use of existing resources by consulting open-source software (OSS) repositories. Such platforms contain rich data sources, e.g., code snippets, documentation, and user discussions, that can be useful for supporting development activities. Over the last decades, several techniques and tools have been promoted to provide developers with innovative features, aiming to bring in improvements in terms of development effort, cost savings, and productivity. In the context of the EU H2020 CROSSMINER project, a set of recommendation systems has been conceived to assist software programmers in different phases of the development process. The systems provide developers with various artifacts, such as third-party libraries, documentation about how to use the APIs being adopted, or relevant API function calls. To develop such recommendations, various technical choices have been made to overcome issues related to several aspects including the lack of baselines, limited data availability, decisions about the performance measures, and evaluation approaches. This paper is an experience report to present the knowledge pertinent to the set of recommendation systems developed through the CROSSMINER project. We explain in detail the challenges we had to deal with, together with the related lessons learned when developing and evaluating these systems. Our aim is to provide the research community with concrete takeaway messages that are expected to be useful for those who want to develop or customize their own recommendation systems. The reported experiences can facilitate interesting discussions and research work, which in the end contribute to the advancement of recommendation systems applied to solve different issues in Software Engineering.

SEFeb 15, 2021Code
Recommending API Function Calls and Code Snippets to Support Software Development

Phuong T. Nguyen, Juri Di Rocco, Claudio Di Sipio et al.

Software development activity has reached a high degree of complexity, guided by the heterogeneity of the components, data sources, and tasks. The proliferation of open-source software (OSS) repositories has stressed the need to reuse available software artifacts efficiently. To this aim, it is necessary to explore approaches to mine data from software repositories and leverage it to produce helpful recommendations. We designed and implemented FOCUS as a novel approach to provide developers with API calls and source code while they are programming. The system works on the basis of a context-aware collaborative filtering technique to extract API usages from OSS projects. In this work, we show the suitability of FOCUS for Android programming by evaluating it on a dataset of 2,600 mobile apps. The empirical evaluation results show that our approach outperforms two state-of-the-art API recommenders, UP-Miner and PAM, in terms of prediction accuracy. We also point out that there is no significant relationship between the categories for apps defined in Google Play and their API usages. Finally, we show that participants of a user study positively perceive the API and source code recommended by FOCUS as relevant to the current development context.

SEJan 20, 2022
Providing Upgrade Plans for Third-party Libraries: A Recommender System using Migration Graphs

Riccardo Rubei, Davide Di Ruscio, Claudio Di Sipio et al.

During the development of a software project, developers often need to upgrade third-party libraries (TPLs), aiming to keep their code up-to-date with the newest functionalities offered by the used libraries. In most cases, upgrading used TPLs is a complex and error-prone activity that must be carefully carried out to limit the ripple effects on the software project that depends on the libraries being upgraded. In this paper, we propose EvoPlan as a novel approach to the recommendation of different upgrade plans given a pair of library-version as input. In particular, among the different paths that can be possibly followed to upgrade the current library version to the desired updated one, EvoPlan is able to suggest the plan that can potentially minimize the efforts being needed to migrate the code of the clients from the library's current release to the target one. The approach has been evaluated on a curated dataset using conventional metrics used in Information Retrieval, i.e., precision, recall, and F-measure. The experimental results show that EvoPlan obtains an encouraging prediction performance considering two different criteria in the plan specification, i.e., the popularity of migration paths and the number of open and closed issues in GitHub for those projects that have already followed the recommended migration paths.

SENov 29, 2021
Enhancing syntax expressiveness in domain-specific modelling

Damiano Di Vicenzo, Juri Di Rocco, Davide Di Ruscio et al.

Domain-specific modelling helps tame the complexity of today's application domains by formalizing concepts and their relationships in modelling languages. While meta-editors are widely-used frameworks for implementing graphical editors for such modelling languages, they are best suitable for defining {novel} topological notations, i.e., syntaxes where the model layout does not contribute to the model semantics. In contrast, many engineering fields, e.g., railways systems or electrical engineering, use notations that, on the one hand, are standard and, on the other hand, are demanding more expressive power than topological syntaxes. In this paper, we discuss the problem of enhancing the expressiveness of modelling editors towards geometric/positional syntaxes. Several potential solutions are experimentally implemented on the jjodel web-based platform with the aim of identifying challenges and opportunities.

SESep 19, 2021
A domain-specific modeling and analysis environment for complex IoT applications

Felicien Ihirwe, Davide Di Ruscio, Silvia Mazzini et al.

To cope with the complexities found in the Internet of Things domain, designers and developers of IoT applications demand practical tools. Several model-driven engineering methodologies and tools have been developed to address such difficulties, but few of them address the analysis aspects. In this extended abstract, we introduce CHESSIoT, a domain-specific modeling environment for complex IoT applications. In addition, the existing supported real-time analysis mechanism, as well as a proposed code generation approach, are presented

SEMay 28, 2021
Towards a modeling and analysis environment for industrial IoT systems

Felicien Ihirwe, Davide Di Ruscio, Silvia Mazzini et al.

The development of Industrial Internet of Things systems (IIoT) requires tools robust enough to cope with the complexity and heterogeneity of such systems, which are supposed to work in safety-critical conditions. The availability of methodologies to support early analysis, verification, and validation is still an open issue in the research community. The early real-time schedulability analysis can help quantify to what extent the desired system's timing performance can eventually be achieved. In this paper, we present CHESSIoT, a model-driven environment to support the design and analysis of industrial IoT systems. CHESSIoT follows a multi-view, component-based modelling approach with a comprehensive way to perform event-based modelling on system components for code generation purposes employing an intermediate ThingML model. To showcase the capability of the extension, we have designed and analysed an Industrial real-time safety use case.

SESep 3, 2020
Low-code Engineering for Internet of things: A state of research

Felicien Ihirwe, Davide Di Ruscio, Silvia Mazzini et al.

Developing Internet of Things (IoT) systems has to cope with several challenges mainly because of the heterogeneity of the involved sub-systems and components. With the aim of conceiving languages and tools supporting the development of IoT systems, this paper presents the results of the study, which has been conducted to understand the current state of the art of existing platforms, and in particular low-code ones, for developing IoT systems. By analyzing sixteen platforms, a corresponding set of features has been identified to represent the functionalities and the services that each analyzed platform can support. We also identify the limitations of already existing approaches and discuss possible ways to improve and address them in the future.

SENov 8, 2016
Protocol for a Systematic Mapping Study on Collaborative Model-Driven Software Engineering

Mirco Franzago, Davide Di Ruscio, Ivano Malavolta et al.

Nowadays, collaborative modeling performed by multiple stakeholders is gaining a growing interest in both academia and practice. However, it poses a set of research challenges, such as large and complex models management, support for multi-user modeling environments, and synchronization mechanisms like models migration and merging, conflicts management, models versioning and rollback support. A body of knowledge in the scientific literature about collaborative model-driven software engineering (MDSE) exists. Still, those studies are scattered across different independent research areas, such as software engineering, model-driven engineering languages and systems, model integrated computing, etc., and a study classifying and comparing the various approaches and methods for collaborative MDSE is still missing. Under this perspective, a systematic mapping study (SMS) can help researchers and practitioners in (i) having a complete, comprehensive and valid picture of the state of the art about collaborative MDSE, and (ii) identifying potential gaps in current research and future research directions.