Leonardo Mariani

SE
h-index15
37papers
429citations
Novelty35%
AI Score36

37 Papers

SEAug 31, 2023
An Energy-Aware Approach to Design Self-Adaptive AI-based Applications on the Edge

Alessandro Tundo, Marco Mobilio, Shashikant Ilager et al.

The advent of edge devices dedicated to machine learning tasks enabled the execution of AI-based applications that efficiently process and classify the data acquired by the resource-constrained devices populating the Internet of Things. The proliferation of such applications (e.g., critical monitoring in smart cities) demands new strategies to make these systems also sustainable from an energetic point of view. In this paper, we present an energy-aware approach for the design and deployment of self-adaptive AI-based applications that can balance application objectives (e.g., accuracy in object detection and frames processing rate) with energy consumption. We address the problem of determining the set of configurations that can be used to self-adapt the system with a meta-heuristic search procedure that only needs a small number of empirical samples. The final set of configurations are selected using weighted gray relational analysis, and mapped to the operation modes of the self-adaptive application. We validate our approach on an AI-based application for pedestrian detection. Results show that our self-adaptive application can outperform non-adaptive baseline configurations by saving up to 81\% of energy while loosing only between 2% and 6% in accuracy.

SEJan 14
"Where is My Troubleshooting Procedure?": Studying the Potential of RAG in Assisting Failure Resolution of Large Cyber-Physical System

Maria Teresa Rossi, Leonardo Mariani, Oliviero Riganelli et al.

In today's complex industrial environments, operators must often navigate through extensive technical manuals to identify troubleshooting procedures that may help react to some observed failure symptoms. These manuals, written in natural language, describe many steps in detail. Unfortunately, the number, magnitude, and articulation of these descriptions can significantly slow down and complicate the retrieval of the correct procedure during critical incidents. Interestingly, Retrieval Augmented Generation (RAG) enables the development of tools based on conversational interfaces that can assist operators in their retrieval tasks, improving their capability to respond to incidents. This paper presents the results of a set of experiments that derive from the analysis of the troubleshooting procedures available in Fincantieri, a large international company developing complex naval cyber-physical systems. Results show that RAG can assist operators in reacting promptly to failure symptoms, although specific measures have to be taken into consideration to cross-validate recommendations before actuating them.

SEFeb 13, 2024
Analyzing Prompt Influence on Automated Method Generation: An Empirical Study with Copilot

Ionut Daniel Fagadau, Leonardo Mariani, Daniela Micucci et al.

Generative AI is changing the way developers interact with software systems, providing services that can produce and deliver new content, crafted to satisfy the actual needs of developers. For instance, developers can ask for new code directly from within their IDEs by writing natural language prompts, and integrated services based on generative AI, such as Copilot, immediately respond to prompts by providing ready-to-use code snippets. Formulating the prompt appropriately, and incorporating the useful information while avoiding any information overload, can be an important factor in obtaining the right piece of code. The task of designing good prompts is known as prompt engineering. In this paper, we systematically investigate the influence of eight prompt features on the style and the content of prompts, on the level of correctness, complexity, size, and similarity to the developers' code of the generated code. We specifically consider the task of using Copilot with 124,800 prompts obtained by systematically combining the eight considered prompt features to generate the implementation of 200 Java methods. Results show how some prompt features, such as the presence of examples and the summary of the purpose of the method, can significantly influence the quality of the result.

SEJan 18, 2024
MutaBot: A Mutation Testing Approach for Chatbots

Michael Ferdinando Urrico, Diego Clerissi, Leonardo Mariani

Mutation testing is a technique aimed at assessing the effectiveness of test suites by seeding artificial faults into programs. Although available for many platforms and languages, no mutation testing tool is currently available for conversational chatbots, which represent an increasingly popular solution to design systems that can interact with users through a natural language interface. Note that since conversations must be explicitly engineered by the developers of conversational chatbots, these systems are exposed to specific types of faults not supported by existing mutation testing tools. In this paper, we present MutaBot, a mutation testing tool for conversational chatbots. MutaBot addresses mutations at multiple levels, including conversational flows, intents, and contexts. We designed the tool to potentially target multiple platforms, while we implemented initial support for Google Dialogflow chatbots. We assessed the tool with three Dialogflow chatbots and test cases generated with Botium, revealing weaknesses in the test suites.

SEFeb 24, 2022
Proactive Libraries: Enforcing Correct Behaviors in Android Apps

Oliviero Riganelli, Ionut Daniel Fagadau, Daniela Micucci et al.

The Android framework provides a rich set of APIs that can be exploited by developers to build their apps. However, the rapid evolution of these APIs jointly with the specific characteristics of the lifecycle of the Android components challenge developers, who may release apps that use APIs incorrectly. In this demo, we present Proactive Libraries, a tool that can be used to decorate regular libraries with the capability of proactively detecting and healing API misuses at runtime. Proactive Libraries blend libraries with multiple proactive modules that collect data, check the compliance of API usages with correctness policies, and heal executions as soon as the possible violation of a policy is detected. The results of our evaluation with 27 possible API misuses show the effectiveness of Proactive Libraries in correcting API misuses with negligible runtime overhead.

SEJan 3, 2022
Exception-Driven Fault Localization for Automated Program Repair

Davide Ginelli, Oliviero Riganelli, Daniela Micucci et al.

Automated Program Repair (APR) techniques typically exploit spectrum-based fault localization (SBFL) to identify the program locations that should be patched, making the effectiveness of APR techniques dependent on the effectiveness of fault localization. Indeed, results show that SBFL often does not localize faults accurately, hindering the effectiveness of APR. In this paper, we propose EXCEPT, a technique that addresses the localization problem by focusing on the semantics of failures rather than on the correlation between the executed statements and the failed tests, as SBFL does. We focus on failures due to exceptions and we exploit their type and source to localize and guess the faults. Experiments with 43 exception-raising faults from the Defects4J benchmark show that EXCEPT can perform better than Ochiai and ssFix.

NEOct 6, 2021
Cloud Failure Prediction with Hierarchical Temporal Memory: An Empirical Assessment

Oliviero Riganelli, Paolo Saltarel, Alessandro Tundo et al.

Hierarchical Temporal Memory (HTM) is an unsupervised learning algorithm inspired by the features of the neocortex that can be used to continuously process stream data and detect anomalies, without requiring a large amount of data for training nor requiring labeled data. HTM is also able to continuously learn from samples, providing a model that is always up-to-date with respect to observations. These characteristics make HTM particularly suitable for supporting online failure prediction in cloud systems, which are systems with a dynamically changing behavior that must be monitored to anticipate problems. This paper presents the first systematic study that assesses HTM in the context of failure prediction. The results that we obtained considering 72 configurations of HTM applied to 12 different types of faults introduced in the Clearwater cloud system show that HTM can help to predict failures with sufficient effectiveness (F-measure = 0.76), representing an interesting practical alternative to (semi-)supervised algorithms.

SEApr 12, 2021
An Evolutionary Approach to Adapt Tests Across Mobile Apps

Leonardo Mariani, Mauro Pezzè, Valerio Terragni et al.

Automatic generators of GUI tests often fail to generate semantically relevant test cases, and thus miss important test scenarios. To address this issue, test adaptation techniques can be used to automatically generate semantically meaningful GUI tests from test cases of applications with similar functionalities. In this paper, we present ADAPTDROID, a technique that approaches the test adaptation problem as a search-problem, and uses evolutionary testing to adapt GUI tests (including oracles) across similar Android apps. In our evaluation with 32 popular Android apps, ADAPTDROID successfully adapted semantically relevant test cases in 11 out of 20 cross-app adaptation scenarios.

SEFeb 28, 2021
On Introducing Automatic Test Case Generation in Practice: A Success Story and Lessons Learned

Matteo Brunetto, Giovanni Denaro, Leonardo Mariani et al.

The level and quality of automation dramatically affects software testing activities, determines costs and effectiveness of the testing process, and largely impacts on the quality of the final product. While costs and benefits of automating many testing activities in industrial practice (including managing the quality process, executing large test suites, and managing regression test suites) are well understood and documented, the benefits and obstacles of automatically generating system test suites in industrial practice are not well reported yet, despite the recent progresses of automated test case generation tools. Proprietary tools for automatically generating test cases are becoming common practice in large software organisations, and commercial tools are becoming available for some application domains and testing levels. However, generating system test cases in small and medium-size software companies is still largely a manual, inefficient and ad-hoc activity. This paper reports our experience in introducing techniques for automatically generating system test suites in a medium-size company. We describe the technical and organisational obstacles that we faced when introducing automatic test case generation in the development process of the company, and present the solutions that we successfully experienced in that context. In particular, the paper discusses the problems of automating the generation of test cases by referring to a customised ERP application that the medium-size company developed for a third party multinational company, and presents ABT2.0, the test case generator that we developed by tailoring ABT, a research state-of-the-art GUI test generator, to their industrial environment. This paper presents the new features of ABT2.0, and discusses how these new features address the issues that we faced.

SEJan 1, 2021
Declarative Dashboard Generation

Alessandro Tundo, Chiara Castelnovo, Marco Mobilio et al.

Systems of systems are highly dynamic software systems that require flexible monitoring solutions to be observed and controlled. Indeed, operators have to frequently adapt the set of collected indicators according to changing circumstances, to visualize the behavior of the monitored systems and timely take actions, if needed. Unfortunately, dashboard systems are still quite cumbersome to configure and adapt to a changing set of indicators that must be visualized. This paper reports our initial effort towards the definition of an automatic dashboard generation process that exploits metamodel layouts to create a full dashboard from a set of indicators selected by operators.

SEDec 31, 2020
FILO: FIx-LOcus Localization for Backward Incompatibilities Caused by Android Framework Upgrades

Marco Mobilio, Oliviero Riganelli, Daniela Micucci et al.

Mobile operating systems evolve quickly, frequently updating the APIs that app developers use to build their apps. Unfortunately, API updates do not always guarantee backward compatibility, causing apps to not longer work properly or even crash when running with an updated system. This paper presents FILO, a tool that assists Android developers in resolving backward compatibility issues introduced by API upgrades. FILO both suggests the method that needs to be modified in the app in order to adapt the app to an upgraded API, and reports key symptoms observed in the failed execution to facilitate the fixing activity. Results obtained with the analysis of 12 actual upgrade problems and the feedback produced by early tool adopters show that FILO can practically support Android developers.FILO can be downloaded from https://gitlab.com/learnERC/filo, and its video demonstration is available at https://youtu.be/WDvkKj-wnlQ.

SEDec 11, 2020
A Comprehensive Study of Code-removal Patches in Automated Program Repair

Davide Ginelli, Matias Martinez, Leonardo Mariani et al.

Automatic Program Repair (APR) techniques can promisingly help reducing the cost of debugging. Many relevant APR techniques follow the generate-and-validate approach, that is, the faulty program is iteratively modified with different change operators and then validated with a test suite until a plausible patch is generated. In particular, Kali is a generate-and-validate technique developed to investigate the possibility of generating plausible patches by only removing code. Former studies show that indeed Kali successfully addressed several faults. This paper addresses the case of code-removal patches in automated program repair investigating the reasons and the scenarios that make their creation possible, and the relationship with patches implemented by developers. Our study reveals that code-removal patches are often insufficient to fix bugs, and proposes a comprehensive taxonomy of code-removal patches that provides evidence of the problems that may affect test suites, opening new opportunities for researchers in the field of automatic program repair.

SEOct 12, 2020
Data Loss Detector: Automatically Revealing Data Loss Bugs in Android Apps

Oliviero Riganelli, Simone Paolo Mottadelli, Claudio Rota et al.

Android apps must work correctly even if their execution is interrupted by external events. For instance, an app must work properly even if a phone call is received, or after its layout is redrawn because the smartphone has been rotated. Since these events may require destroying, when the execution is interrupted, and recreating, when the execution is resumed, the foreground activity of the app, the only way to prevent the loss of state information is saving and restoring it. This behavior must be explicitly implemented by app developers, who often miss to implement it properly, releasing apps affected by data loss problems, that is, apps that may lose state information when their execution is interrupted. Although several techniques can be used to automatically generate test cases for Android apps, the obtained test cases seldom include the interactions and the checks necessary to exercise and reveal data loss faults. To address this problem, this paper presents Data Loss Detector (DLD), a test case generation technique that integrates an exploration strategy, data-loss-revealing actions, and two customized oracle strategies for the detection of data loss failures. DLD has been able to reveal 75% of the faults in a benchmark of 54 Android app releases affected by 110 known data loss faults. DLD also revealed unknown data loss problems, outperforming competing approaches.

SEOct 8, 2020
Test4Enforcers: Test Case Generation for Software Enforcers

Michell Guzman, Oliviero Riganelli, Daniela Micucci et al.

Software enforcers can be used to modify the runtime behavior of software applications to guarantee that relevant correctness policies are satisfied. Indeed, the implementation of software enforcers can be tricky, due to the heterogeneity of the situations that they must be able to handle. Assessing their ability to steer the behavior of the target system without introducing any side effect is an important challenge to fully trust the resulting system. To address this challenge, this paper presents Test4Enforcers, the first approach to derive thorough test suites that can validate the impact of enforcers on a target system. The paper also shows how to implement the Test4Enforcers approach in the DroidBot test generator to validate enforcers for Android apps.

SEFeb 5, 2020
CBR: Controlled Burst Recording

Oscar Cornejo, Daniela Briola, Daniela Micucci et al.

Collecting traces from software running in the field is both useful and challenging. Traces may indeed help revealing unexpected usage scenarios, detecting and reproducing failures, and building behavioral models that reflect how the software is actually used. On the other hand, recording traces is an intrusive activity that may annoy users, negatively affecting the usability of the applications, if not properly designed. In this paper we address field monitoring by introducing Controlled Burst Recording, a monitoring solution that can collect comprehensive runtime data without compromising the quality of the user experience. The technique encodes the knowledge extracted from the monitored application as a finite state model that both represents the sequences of operations that can be executed by the users and the corresponding internal computations that might be activated by each operation. Our initial assessment with information extracted from ArgoUML shows that Controlled Burst Recording can reconstruct behavioral information more effectively than competing sampling techniques, with a low impact on the system response time.

SEFeb 5, 2020
A Framework for In-Vivo Testing of Mobile Applications

Mariano Ceccato, Davide Corradini, Luca Gazzola et al.

The ecosystem in which mobile applications run is highly heterogeneous and configurable. All layers upon which mobile apps are built offer wide possibilities of variations, from the device and the hardware, to the operating system and middleware, up to the user preferences and settings. Testing all possible configurations exhaustively, before releasing the app, is unaffordable. As a consequence, the app may exhibit different, including faulty, behaviours when executed in the field, under specific configurations. In this paper, we describe a framework that can be instantiated to support in-vivo testing of a mobile app. The framework monitors the configuration in the field and triggers in-vivo testing when an untested configuration is recognized. Experimental results show that the overhead introduced by monitoring is unnoticeable to negligible (i.e., 0-6%) depending on the device being used (high- vs. low-end). In-vivo test execution required on average 3s: if performed upon screen lock activation, it introduces just a slight delay before locking the device.

SEJan 20, 2020
In-The-Field Monitoring of Functional Calls: Is It Feasible?

Oscar Cornejo, Daniela Briola, Daniela Micucci et al.

Collecting data about the sequences of function calls executed by an application while running in the field can be useful to a number of applications, including failure reproduction, profiling, and debugging. Unfortunately, collecting data from the field may introduce annoying slowdowns that negatively affect the quality of the user experience. So far, the impact of monitoring has been mainly studied in terms of the overhead that it may introduce in the monitored applications, rather than considering if the introduced overhead can be really recognized by users. In this paper we take a different perspective studying to what extent collecting data about sequences of function calls may impact the quality of the user experience, producing recognizable effects. Interestingly we found that, depending on the nature of the executed operation and its execution context, users may tolerate a non-trivial overhead. This information can be potentially exploited to collect significant amount of data without annoying users.

SENov 21, 2019
FILO: FIx-LOcus Recommendation for Problems Caused by Android Framework Upgrade

Marco Mobilio, Oliviero Riganelli, Daniela Micucci et al.

Dealing with the evolution of operating systems is challenging for developers of mobile apps, who have to deal with frequent upgrades that often include backward incompatible changes of the underlying API framework. As a consequence of framework upgrades, apps may show misbehaviours and unexpected crashes once executed within an evolved environment. Identifying the portion of the app that must be modified to correctly execute on a newly released operating system can be challenging. Although incompatibilities are visibile at the level of the interactions between the app and its execution environment, the actual methods to be changed are often located in classes that do not directly interact with any external element. To facilitate debugging activities for problems introduced by backward incompatible upgrades of the operating system, this paper presents FILO, a technique that can recommend the method that must be changed to implement the fix from the analysis of a single failing execution. FILO can also select key symptomatic anomalous events that can help the developer understanding the reason of the failure and facilitate the implementation of the fix. Our evaluation with multiple known compatibility problems introduced by Android upgrades shows that FILO can effectively and efficiently identify the faulty methods in the apps.

SENov 21, 2019
Controlling Interactions with Libraries in Android Apps Through Runtime Enforcement

Oliviero Riganelli, Daniela Micucci, Leonardo Mariani

Android applications are executed on smartphones equipped with a variety of resources that must be properly accessed and controlled, otherwise the correctness of the executions and the stability of the entire environment might be negatively affected. For example, apps must properly acquire, use, and release microphones, cameras, and other multimedia devices otherwise the behavior of the apps that use the same resources might be compromised. Unfortunately, several apps do not use resources correctly, for instance due to faults and inaccurate design decisions. By interacting with these apps users may experience unexpected behaviors, which in turn may cause instability and sporadic failures, especially when resources are accessed. In this paper, we present an approach that lets users protect their environment from the apps that use resources improperly by enforcing the correct usage protocol. This is achieved by using software enforcers that can observe executions and change them when necessary. For instance, enforcers can detect that a resource has been acquired but not released, and automatically perform the release operation, thus giving the possibility to use that same resource to the other apps. The main idea is that software libraries, in particular the ones controlling access to resources, can be augmented with enforcers that can be activated and deactivated on demand by users to protect their environment from unwanted app behaviors. We call the software libraries augmented with one or more enforcers proactive libraries because the activation of the enforcer decorates the library with proactive behaviors that can guarantee the correctness of the execution despite the invocation of the operations implemented by the library.

SESep 18, 2019
Anomaly Detection As-a-Service

Marco Mobilio, Matteo Orrù, Oliviero Riganelli et al.

Cloud systems are complex, large, and dynamic systems whose behavior must be continuously analyzed to timely detect misbehaviors and failures. Although there are solutions to flexibly monitor cloud systems, cost-effectively controlling the anomaly detection logic is still a challenge. In particular, cloud operators may need to quickly change the types of detected anomalies and the scope of anomaly detection, for instance based on observations. This kind of intervention still consists of a largely manual and inefficient ad-hoc effort. In this paper, we present Anomaly Detection as-a-Service (ADaaS), which uses the same as-a-service paradigm often exploited in cloud systems to declarative control the anomaly detection logic. Operators can use ADaaS to specify the set of indicators that must be analyzed and the types of anomalies that must be detected, without having to address any operational aspect. Early results with lightweight detectors show that the presented approach is a promising solution to deliver better control of the anomaly detection logic.

SEMay 27, 2019
A Benchmark of Data Loss Bugs for Android Apps

Oliviero Riganelli, Marco Mobilio, Daniela Micucci et al.

Android apps must be able to deal with both stop events, which require immediately stopping the execution of the app without losing state information, and start events, which require resuming the execution of the app at the same point it was stopped. Support to these kinds of events must be explicitly implemented by developers who unfortunately often fail to implement the proper logic for saving and restoring the state of an app. As a consequence apps can lose data when moved to background and then back to foreground (e.g., to answer a call) or when the screen is simply rotated. These faults can be the cause of annoying usability issues and unexpected crashes. This paper presents a public benchmark of 110 data loss faults in Android apps that we systematically collected to facilitate research and experimentation with these problems. The benchmark is available on GitLab and includes the faulty apps, the fixed apps (when available), the test cases to automatically reproduce the problems, and additional information that may help researchers in their tasks.

SEMar 29, 2019
Automatic Failure Explanation in CPS Models

Ezio Bartocci, Niveditha Manjunath, Leonardo Mariani et al.

Debugging Cyber-Physical System (CPS) models can be extremely complex. Indeed, only the detection of a failure is insuffcient to know how to correct a faulty model. Faults can propagate in time and in space producing observable misbehaviours in locations completely different from the location of the fault. Understanding the reason of an observed failure is typically a challenging and laborious task left to the experience and domain knowledge of the designer. \n In this paper, we propose CPSDebug, a novel approach that by combining testing, specification mining, and failure analysis, can automatically explain failures in Simulink/Stateflow models. We evaluate CPSDebug on two case studies, involving two use scenarios and several classes of faults, demonstrating the potential value of our approach.

SEFeb 11, 2019
COST Action IC 1402 ArVI: Runtime Verification Beyond Monitoring -- Activity Report of Working Group 1

Wolfgang Ahrendt, Cyrille Artho, Christian Colombo et al.

This report presents the activities of the first working group of the COST Action ArVI, Runtime Verification beyond Monitoring. The report aims to provide an overview of some of the major core aspects involved in Runtime Verification. Runtime Verification is the field of research dedicated to the analysis of system executions. It is often seen as a discipline that studies how a system run satisfies or violates correctness properties. The report exposes a taxonomy of Runtime Verification (RV) presenting the terminology involved with the main concepts of the field. The report also develops the concept of instrumentation, the various ways to instrument systems, and the fundamental role of instrumentation in designing an RV framework. We also discuss how RV interplays with other verification techniques such as model-checking, deductive verification, model learning, testing, and runtime assertion checking. Finally, we propose challenges in monitoring quantitative and statistical data beyond detecting property violation.

SEOct 11, 2018
Increasing the Reusability of Enforcers with Lifecycle Events

Oliviero Riganelli, Daniela Micucci, Leonardo Mariani

Runtime enforcement can be effectively used to improve the reliability of software applications. However, it often requires the definition of ad hoc policies and enforcement strategies, which might be expensive to identify and implement. This paper discusses how to exploit lifecycle events to obtain useful enforcement strategies that can be easily reused across applications, thus reducing the cost of adoption of the runtime enforcement technology. The paper finally sketches how this idea can be used to define libraries that can automatically overcome problems related to applications misusing them.

SEJul 19, 2018
Model-Based Monitoring for IoTs Smart Cities Applications

Matteo Orrù, Marco Mobilio, Anas Shatnawi et al.

Smart Cities are future urban aggregations, where a multitude of heterogeneous systems and IoT devices interact to provide a safer, more efficient, and greener environment. The vision of smart cities is adapting accordingly to the evolution of software and IoT based services. The current trend is not to have a big comprehensive system, but a plethora of small, well integrated systems that interact one with each other. Monitoring these kinds of systems is challenging for a number of reasons.

SEMar 14, 2018
CloudHealth: A Model-Driven Approach to Watch the Health of Cloud Services

Anas Shatnawi, Matteo Orrù, Marco Mobilio et al.

Cloud systems are complex and large systems where services provided by different operators must coexist and eventually cooperate. In such a complex environment, controlling the health of both the whole environment and the individual services is extremely important to timely and effectively react to misbehaviours, unexpected events, and failures. Although there are solutions to monitor cloud systems at different granularity levels, how to relate the many KPIs that can be collected about the health of the system and how health information can be properly reported to operators are open questions. This paper reports the early results we achieved in the challenge of monitoring the health of cloud systems. In particular we present CloudHealth, a model-based health monitoring approach that can be used by operators to watch specific quality attributes. The CloudHealth Monitoring Model describes how to operationalize high level monitoring goals by dividing them into subgoals, deriving metrics for the subgoals, and using probes to collect the metrics. We use the CloudHealth Monitoring Model to control the probes that must be deployed on the target system, the KPIs that are dynamically collected, and the visualization of the data in dashboards.

SEMar 1, 2018
Localizing Faults in Cloud Systems

Leonardo Mariani, Cristina Monni, Mauro Pezzé et al.

By leveraging large clusters of commodity hardware, the Cloud offers great opportunities to optimize the operative costs of software systems, but impacts significantly on the reliability of software applications. The lack of control of applications over Cloud execution environments largely limits the applicability of state-of-the-art approaches that address reliability issues by relying on heavyweight training with injected faults. In this paper, we propose \emph(LOUD}, a lightweight fault localization approach that relies on positive training only, and can thus operate within the constraints of Cloud systems. \emph{LOUD} relies on machine learning and graph theory. It trains machine learning models with correct executions only, and compensates the inaccuracy that derives from training with positive samples, by elaborating the outcome of machine learning techniques with graph theory algorithms. The experimental results reported in this paper confirm that \emph{LOUD} can localize faults with high precision, by relying only on a lightweight positive training.

SEAug 30, 2017
An Exploratory Study of Field Failures

Luca Gazzola, Leonardo Mariani, Fabrizio Pastore et al.

Field failures, that is, failures caused by faults that escape the testing phase leading to failures in the field, are unavoidable. Improving verification and validation activities before deployment can identify and timely remove many but not all faults, and users may still experience a number of annoying problems while using their software systems. This paper investigates the nature of field failures, to understand to what extent further improving in-house verification and validation activities can reduce the number of failures in the field, and frames the need of new approaches that operate in the field. We report the results of the analysis of the bug reports of five applications belonging to three different ecosystems, propose a taxonomy of field failures, and discuss the reasons why failures belonging to the identified classes cannot be detected at design time but shall be addressed at runtime. We observe that many faults (70%) are intrinsically hard to detect at design-time.

SEAug 24, 2017
Fragmented Monitoring

Oscar Cornejo, Daniela Briola, Daniela Micucci et al.

Field data is an invaluable source of information for testers and developers because it witnesses how software systems operate in real environments, capturing scenarios and configurations relevant to end-users. Unfortunately, collecting traces might be resource-consuming and can significantly affect the user experience, for instance causing annoying slowdowns. Existing monitoring techniques can control the overhead introduced in the applications by reducing the amount of collected data, for instance by collecting each event only with a given probability. However, collecting fewer events limits the amount of information extracted from the field and may fail in providing a comprehensive picture of the behavior of a program. In this paper we present fragmented monitoring, a monitoring technique that addresses the issue of collecting information from the field without annoying users. The key idea of fragmented monitoring is to reduce the overhead by recording partial traces (fragments) instead of full traces, while annotating the beginning and the end of each fragment with state information. These annotations are exploited offline to derive traces that might be likely observed in the field and that could not be collected directly due to the overhead that would be introduced in a program.

SEAug 7, 2017
VART: A Tool for the Automatic Detection of Regression Faults

Fabrizio Pastore, Leonardo Mariani

In this paper we present VART, a tool for automatically revealing regression faults missed by regression test suites. Interestingly, VART is not limited to faults causing crashing or exceptions, but can reveal faults that cause the violation of application-specific correctness properties. VART achieves this goal by combining static and dynamic program analysis.

SEAug 4, 2017
BDCI: Behavioral Driven Conflict Identification

Fabrizio Pastore, Leonardo Mariani, Daniela Micucci

Source Code Management (SCM) systems support software evolution by providing features, such as version control, branching, and conflict detection. Despite the presence of these features, support to parallel software development is often limited. SCM systems can only address a subset of the conflicts that might be introduced by developers when concurrently working on multiple parallel branches. In fact, SCM systems can detect textual conflicts, which are generated by the concurrent modification of the same program locations, but they are unable to detect higher-order conflicts, which are generated by the concurrent modification of different program locations that generate program misbehaviors once merged. Higher-order conflicts are painful to detect and expensive to fix because they might be originated by the interference of apparently unrelated changes. In this paper we present Behavioral Driven Conflict Identification (BDCI), a novel approach to conflict detection. BDCI moves the analysis of conflicts from the source code level to the level of program behavior by generating and comparing behavioral models. The analysis based on behavioral models can reveal interfering changes as soon as they are introduced in the SCM system, even if they do not introduce any textual conflict. To evaluate the effectiveness and the cost of the proposed approach, we developed BDCIf , a specific instance of BDCI dedicated to the detection of higher-order conflicts related to the functional behavior of a program. The evidence collected by analyzing multiple versions of Git and Redis suggests that BDCIf can effectively detect higher-order conflicts and report how changes might interfere.

SEJul 24, 2017
Verifying Policy Enforcers

Oliviero Riganelli, Daniela Micucci, Leonardo Mariani et al.

Policy enforcers are sophisticated runtime components that can prevent failures by enforcing the correct behavior of the software. While a single enforcer can be easily designed focusing only on the behavior of the application that must be monitored, the effect of multiple enforcers that enforce different policies might be hard to predict. So far, mechanisms to resolve interferences between enforcers have been based on priority mechanisms and heuristics. Although these methods provide a mechanism to take decisions when multiple enforcers try to affect the execution at a same time, they do not guarantee the lack of interference on the global behavior of the system. In this paper we present a verification strategy that can be exploited to discover interferences between sets of enforcers and thus safely identify a-priori the enforcers that can co-exist at run-time. In our evaluation, we experimented our verification method with several policy enforcers for Android and discovered some incompatibilities.

SEMay 23, 2017
Timed k-Tail: Automatic Inference of Timed Automata

Fabrizio Pastore, Daniela Micucci, Leonardo Mariani

Accurate and up-to-date models describing the be- havior of software systems are seldom available in practice. To address this issue, software engineers may use specification mining techniques, which can automatically derive models that capture the behavior of the system under analysis. So far, most specification mining techniques focused on the functional behavior of the systems, with specific emphasis on models that represent the ordering of operations, such as tempo- ral rules and finite state models. Although useful, these models are inherently partial. For instance, they miss the timing behavior, which is extremely relevant for many classes of systems and com- ponents, such as shared libraries and user-driven applications. Mining specifications that include both the functional and the timing aspects can improve the applicability of many testing and analysis solutions. This paper addresses this challenge by presenting the Timed k-Tail (TkT) specification mining technique that can mine timed automata from program traces. Since timed automata can effectively represent the interplay between the functional and the timing behavior of a system, TkT could be exploited in those contexts where time-related information is relevant. Our empirical evaluation shows that TkT can efficiently and effectively mine accurate models. The mined models have been used to identify executions with anomalous timing. The evaluation shows that most of the anomalous executions have been correctly identified while producing few false positives.

SEMay 18, 2017
In The Field Monitoring of Interactive Applications

Oscar Cornejo, Daniela Briola, Daniela Micucci et al.

Monitoring techniques can extract accurate data about the behavior of software systems. When used in the field, they can reveal how applications behave in real-world contexts and how programs are actually exercised by their users. Nevertheless, since monitoring might need significant storage and computational resources, it may interfere with users activities degrading the quality of the user experience. While the impact of monitoring has been typically studied by measuring the overhead that it may introduce in a monitored application, there is little knowledge about how monitoring solutions may actually impact on the user experience and to what extent users may recognize their presence. In this paper, we present our investigation on how collecting data in the field may impact the quality of the user experience. Our initial results show that non-trivial overhead can be tolerated by users, depending on the kind of activity that is performed. This opens interesting opportunities for research in monitoring solutions, which could be designed to opportunistically

SEMar 23, 2017
Policy Enforcement with Proactive Libraries

Oliviero Riganelli, Daniela Micucci, Leonardo Mariani

Software libraries implement APIs that deliver reusable functionalities. To correctly use these functionalities, software applications must satisfy certain correctness policies, for instance policies about the order some API methods can be invoked and about the values that can be used for the parameters. If these policies are violated, applications may produce misbehaviors and failures at runtime. Although this problem is general, applications that incorrectly use API methods are more frequent in certain contexts. For instance, Android provides a rich and rapidly evolving set of APIs that might be used incorrectly by app developers who often implement and publish faulty apps in the marketplaces. To mitigate this problem, we introduce the novel notion of proactive library, which augments classic libraries with the capability of proactively detecting and healing misuses at run- time. Proactive libraries blend libraries with multiple proactive modules that collect data, check the correctness policies of the libraries, and heal executions as soon as the violation of a correctness policy is detected. The proactive modules can be activated or deactivated at runtime by the users and can be implemented without requiring any change to the original library and any knowledge about the applications that may use the library. We evaluated proactive libraries in the context of the Android ecosystem. Results show that proactive libraries can automati- cally overcome several problems related to bad resource usage at the cost of a small overhead.

SEJan 19, 2017
Healing Data Loss Problems in Android Apps

Oliviero Riganelli, Daniela Micucci, Leonardo Mariani

Android apps should be designed to cope with stop-start events, which are the events that require stopping and restoring the execution of an app while leaving its state unaltered. These events can be caused by run-time configuration changes, such as a screen rotation, and by context-switches, such as a switch from one app to another. When a stop-start event occurs, Android saves the state of the app, handles the event, and finally restores the saved state. To let Android save and restore the state correctly, apps must provide the appropriate support. Unfortunately, Android developers often implement this support incorrectly, or do not implement it at all. This bad practice makes apps to incorrectly react to stop-start events, thus generating what we defined data loss problems, that is Android apps that lose user data, behave unexpectedly, and crash due to program variables that lost their values. Data loss problems are difficult to detect because they might be observed only when apps are in specific states and with specific inputs. Covering all the possible cases with testing may require a large number of test cases whose execution must be checked manually to discover whether the app under test has been correctly restored after each stop-start event. It is thus important to complement traditional in-house testing activities with mechanisms that can protect apps as soon as a data loss problem occurs in the field. In this paper we present DataLossHealer, a technique for automatically identifying and healing data loss problems in the field as soon as they occur. DataLossHealer is a technique that checks at run-time whether states are recovered correctly, and heals the app when needed. DataLossHealer can learn from experience, incrementally reducing the overhead that is introduced avoiding to monitor interactions that have been managed correctly by the app in the past.