Mariano Ceccato

SE
11papers
243citations
Novelty41%
AI Score44

11 Papers

30.5CRApr 17Code
PolicyGapper: Automated Detection of Inconsistencies Between Google Play Data Safety Sections and Privacy Policies Using LLMs

Luca Ferrari, Billel Habbati, Meriem Guerar et al.

Mobile application developers are required to disclose how they collect, use, and share user data in compliance with privacy regulations. To support transparency, major app marketplaces have introduced standardized disclosure mechanisms. In 2022, Google mandated the Data Safety Section (DSS) on Google Play, requiring developers to summarize their data practices. However, compiling accurate DSS disclosures is challenging, as they must remain consistent with the corresponding privacy policy (PP), and no automated tool currently verifies this alignment. Prior studies indicate that nearly 80% of popular apps contain incomplete or misleading DSS declarations. We present PolicyGapper, an LLM-based methodology for automatically detecting discrepancies between DSS disclosures and privacy policies. PolicyGapper operates in four stages: scraping, pre-processing, analysis, and post-processing, without requiring access to application binaries. We evaluate PolicyGapper on a dataset of 330 top-ranked apps spanning all 33 Google Play categories, collected in Q3 2025. The approach identifies 2,689 omitted disclosures, including 2,040 related to data collection and 649 to data sharing. Manual validation on a stratified 10% subset, repeated across three independent runs, yields an average Precision of 0.75, Recall of 0.77, Accuracy of 0.69, and F1-score of 0.76. To support reproducibility, we release a complete replication package, including the dataset, prompts, source code, and results available at https://github.com/Mobile-IoT-Security-Lab/PolicyGapper and https://doi.org/10.5281/zenodo.19628493.

28.4SEMay 20
Beyond the Tip of the Iceberg: Understanding SATD in Dockerfiles through the Lens of Co-evolution

Wei Minn, Yan Naing Tun, Biniam Fesseha Demissie et al.

Dockerfiles enable the creation of portable container-based execution environments for the application code, and have become an important part of the modern software development process. As Dockerfiles are a form of Infrastructure-as-Code (IaC), they can include temporary workarounds and other suboptimal implementations, leading to the accrual of technical debt that affects their reliability, security, and maintainability in the future. Prior work characterized self-admitted technical debt (SATD) in Dockerfile comments and the surrounding file chunks. This single-file view is incomplete since source code evolution involves changes across different types of software artifacts such as production, test, build, and other configuration files. Thus, we address this gap by studying SATD events in Dockerfiles alongside the related source code. We find that approximately 27% of admission events and 40% of repayment events are coupled to non-Dockerfile artifacts, and coupling sources are subtype-specific. We also observed that coupled SATD in general are repaid significantly faster overall (p = 0.0201), while coupled SATD regarding missing functionalities persists longer than its isolated counterparts; Lastly, we conducted open and axial coding of coupled SATD events, and we observe that external dependency issues, more particularly regarding unreleased upstream packages and bug fixes, are the most common cause of admission triggers in the source code; we also observe that architectural refactoring is the most common prerequisite for the repayment of SATD in Dockerfiles. These findings indicate that both practitioners (e.g. developers and project managers) and SATD researchers should integrate the source code-side co-evolution, rather than the single-file view, as the primary unit of analysis.

SEMar 16, 2021
EtherSolve: Computing an Accurate Control-Flow Graph from Ethereum Bytecode

Filippo Contro, Marco Crosara, Mariano Ceccato et al. · mila

Motivated by the immutable nature of Ethereum smart contracts and of their transactions, quite many approaches have been proposed to detect defects and security problems before smart contracts become persistent in the blockchain and they are granted control on substantial financial value. Because smart contracts source code might not be available, static analysis approaches mostly face the challenge of analysing compiled Ethereum bytecode, that is available directly from the official blockchain. However, due to the intrinsic complexity of Ethereum bytecode (especially in jump resolution), static analysis encounters significant obstacles that reduce the accuracy of exiting automated tools. This paper presents a novel static analysis algorithm based on the symbolic execution of the Ethereum operand stack that allows us to resolve jumps in Ethereum bytecode and to construct an accurate control-flow graph (CFG) of the compiled smart contracts. EtherSolve is a prototype implementation of our approach. Experimental results on a significant set of real world Ethereum smart contracts show that EtherSolve improves the accuracy of the execrated CFGs with respect to the state of the art available approaches. Many static analysis techniques are based on the CFG representation of the code and would therefore benefit from the accurate extraction of the CFG. For example, we implemented a simple extension of EtherSolve that allows to detect instances of the re-entrancy vulnerability.

SEAug 18, 2021
Restats: A Test Coverage Tool for RESTful APIs

Davide Corradini, Amedeo Zampieri, Michele Pasqua et al.

Test coverage is a standard measure used to evaluate the completeness of a test suite. Coverage is typically computed on source code, by assessing the extent of source code entities (e.g., statements, data dependencies, control dependencies) that are exercised when running test cases. When considering REST APIs, an alternative perspective to assess test suite completeness is with respect to the service definition. This paper presents Restats, a test coverage tool for REST APIs that supports eight state-of-the-art test coverage metrics with a black-box perspective, i.e., only relying on the OpenAPI interface specification of the REST API under test. In fact, metrics are computed by only observing the HTTP requests and responses occurring at testing time, and no access to source/compiled code of the REST API is required. These coverage metrics come in handy for: (i) developers and test engineers working at development and maintenance tasks; (ii) stakeholders and customers who want to evaluate the completeness of acceptance tests; (iii) researches interested in comparing different automated test case generation strategies.

SEAug 18, 2021
Empirical Comparison of Black-box Test Case Generation Tools for RESTful APIs

Davide Corradini, Amedeo Zampieri, Michele Pasqua et al.

In literature, we can find research tools to automatically generate test cases for RESTful APIs, addressing the specificity of this particular programming domain. However, no direct comparison of these tools is available to guide developers in deciding which tool best fits their REST API project. In this paper, we present the results of an empirical comparison of automated black-box test case generation approaches for REST APIs. We surveyed the available black-box testing tools that have been proposed in recent literature, finding four usable prototypes: RestTestGen, RESTler, bBOXRT and RESTest. We used these tools to generate test cases for 14 real-world REST services. Then, testing results have been analyzed and compared in terms of robustness (i.e., success rate) and test coverage. Among the considered tools, RESTler appears to be the most solid, able to successfully test all case studies (the other tools experienced crashes). Conversely, test cases generated by RestTestGen scored the highest coverage, suggesting that its testing strategy is the most effective in testing REST APIs.

SEJan 7, 2021
Deep Reinforcement Learning for Black-Box Testing of Android Apps

Andrea Romdhana, Alessio Merlo, Mariano Ceccato et al.

The state space of Android apps is huge and its thorough exploration during testing remains a major challenge. In fact, the best exploration strategy is highly dependent on the features of the app under test. Reinforcement Learning (RL) is a machine learning technique that learns the optimal strategy to solve a task by trial and error, guided by positive or negative reward, rather than by explicit supervision. Deep RL is a recent extension of RL that takes advantage of the learning capabilities of neural networks. Such capabilities make Deep RL suitable for complex exploration spaces such as the one of Android apps. However, state of the art, publicly available tools only support basic, tabular RL. We have developed ARES, a Deep RL approach for black-box testing of Android apps. Experimental results show that it achieves higher coverage and fault revelation than the baselines, which include state of the art RL based tools, such as TimeMachine and Q-Testing. We also investigated qualitatively the reasons behind such performance and we have identified the key features of Android apps that make Deep RL particularly effective on them to be the presence of chained and blocking activities.

SEFeb 5, 2020
A Framework for In-Vivo Testing of Mobile Applications

Mariano Ceccato, Davide Corradini, Luca Gazzola et al.

The ecosystem in which mobile applications run is highly heterogeneous and configurable. All layers upon which mobile apps are built offer wide possibilities of variations, from the device and the hardware, to the operating system and middleware, up to the user preferences and settings. Testing all possible configurations exhaustively, before releasing the app, is unaffordable. As a consequence, the app may exhibit different, including faulty, behaviours when executed in the field, under specific configurations. In this paper, we describe a framework that can be instantiated to support in-vivo testing of a mobile app. The framework monitors the configuration in the field and triggers in-vivo testing when an untested configuration is recognized. Experimental results show that the overhead introduced by monitoring is unnoticeable to negligible (i.e., 0-6%) depending on the device being used (high- vs. low-end). In-vivo test execution required on average 3s: if performed upon screen lock activation, it introduces just a slight delay before locking the device.

SEJan 15, 2019
Obfuscating Java Programs by Translating Selected Portions of Bytecode to Native Libraries

Davide Pizzolotto, Mariano Ceccato

Code obfuscation is a popular approach to turn program comprehension and analysis harder, with the aim of mitigating threats related to malicious reverse engineering and code tampering. However, programming languages that compile to high level bytecode (e.g., Java) can be obfuscated only to a limited extent. In fact, high level bytecode still contains high level relevant information that an attacker might exploit. In order to enable more resilient obfuscations, part of these programs might be implemented with programming languages (e.g., C) that compile to low level machine-dependent code. In fact, machine code contains and leaks less high level information and it enables more resilient obfuscations. In this paper, we present an approach to automatically translate critical sections of high level Java bytecode to C code, so that more effective obfuscations can be resorted to. Moreover, a developer can still work with a single programming language, i.e., Java.

SEDec 19, 2018
AnFlo: Detecting Anomalous Sensitive Information Flows in Android Apps

Biniam Fisseha Demissie, Mariano Ceccato, Lwin Khin Shar

Smartphone apps usually have access to sensitive user data such as contacts, geo-location, and account credentials and they might share such data to external entities through the Internet or with other apps. Confidentiality of user data could be breached if there are anomalies in the way sensitive data is handled by an app which is vulnerable or malicious. Existing approaches that detect anomalous sensitive data flows have limitations in terms of accuracy because the definition of anomalous flows may differ for different apps with different functionalities; it is normal for "Health" apps to share heart rate information through the Internet but is anomalous for "Travel" apps. In this paper, we propose a novel approach to detect anomalous sensitive data flows in Android apps, with improved accuracy. To achieve this objective, we first group trusted apps according to the topics inferred from their functional descriptions. We then learn sensitive information flows with respect to each group of trusted apps. For a given app under analysis, anomalies are identified by comparing sensitive information flows in the app against those flows learned from trusted apps grouped under the same topic. In the evaluation, information flow is learned from 11,796 trusted apps. We then checked for anomalies in 596 new (benign) apps and identified 2 previously-unknown vulnerable apps related to anomalous flows. We also analyzed 18 malware apps and found anomalies in 6 of them.

SEApr 10, 2017
How Professional Hackers Understand Protected Code while Performing Attack Tasks

Mariano Ceccato, Paolo Tonella, Cataldo Basile et al.

Code protections aim at blocking (or at least delaying) reverse engineering and tampering attacks to critical assets within programs. Knowing the way hackers understand protected code and perform attacks is important to achieve a stronger protection of the software assets, based on realistic assumptions about the hackers' behaviour. However, building such knowledge is difficult because hackers can hardly be involved in controlled experiments and empirical studies. The FP7 European project Aspire has given the authors of this paper the unique opportunity to have access to the professional penetration testers employed by the three industrial partners. In particular, we have been able to perform a qualitative analysis of three reports of professional penetration test performed on protected industrial code. Our qualitative analysis of the reports consists of open coding, carried out by 7 annotators and resulting in 459 annotations, followed by concept extraction and model inference. We identified the main activities: understanding, building attack, choosing and customizing tools, and working around or defeating protections. We built a model of how such activities take place. We used such models to identify a set of research directions for the creation of stronger code protections.

SEApr 7, 2017
Assessment of Source Code Obfuscation Techniques

Alessio Viticchié, Leonardo Regano, Marco Torchiano et al.

Obfuscation techniques are a general category of software protections widely adopted to prevent malicious tampering of the code by making applications more difficult to understand and thus harder to modify. Obfuscation techniques are divided in code and data obfuscation, depending on the protected asset. While preliminary empirical studies have been conducted to determine the impact of code obfuscation, our work aims at assessing the effectiveness and efficiency in preventing attacks of a specific data obfuscation technique - VarMerge. We conducted an experiment with student participants performing two attack tasks on clear and obfuscated versions of two applications written in C. The experiment showed a significant effect of data obfuscation on both the time required to complete and the successful attack efficiency. An application with VarMerge reduces by six times the number of successful attacks per unit of time. This outcome provides a practical clue that can be used when applying software protections based on data obfuscation.