20.8CRApr 17Code
PolicyGapper: Automated Detection of Inconsistencies Between Google Play Data Safety Sections and Privacy Policies Using LLMsLuca Ferrari, Billel Habbati, Meriem Guerar et al.
Mobile application developers are required to disclose how they collect, use, and share user data in compliance with privacy regulations. To support transparency, major app marketplaces have introduced standardized disclosure mechanisms. In 2022, Google mandated the Data Safety Section (DSS) on Google Play, requiring developers to summarize their data practices. However, compiling accurate DSS disclosures is challenging, as they must remain consistent with the corresponding privacy policy (PP), and no automated tool currently verifies this alignment. Prior studies indicate that nearly 80% of popular apps contain incomplete or misleading DSS declarations. We present PolicyGapper, an LLM-based methodology for automatically detecting discrepancies between DSS disclosures and privacy policies. PolicyGapper operates in four stages: scraping, pre-processing, analysis, and post-processing, without requiring access to application binaries. We evaluate PolicyGapper on a dataset of 330 top-ranked apps spanning all 33 Google Play categories, collected in Q3 2025. The approach identifies 2,689 omitted disclosures, including 2,040 related to data collection and 649 to data sharing. Manual validation on a stratified 10% subset, repeated across three independent runs, yields an average Precision of 0.75, Recall of 0.77, Accuracy of 0.69, and F1-score of 0.76. To support reproducibility, we release a complete replication package, including the dataset, prompts, source code, and results available at https://github.com/Mobile-IoT-Security-Lab/PolicyGapper and https://doi.org/10.5281/zenodo.19628493.
CRMar 2, 2021
Gotta CAPTCHA 'Em All: A Survey of Twenty years of the Human-or-Computer DilemmaMeriem Guerar, Luca Verderame, Mauro Migliardi et al.
A recent study has found that malicious bots generated nearly a quarter of overall website traffic in 2019 [100]. These malicious bots perform activities such as price and content scraping, account creation and takeover, credit card fraud, denial of service, etc. Thus, they represent a serious threat to all businesses in general, but are especially troublesome for e-commerce, travel and financial services. One of the most common defense mechanisms against bots abusing online services is the introduction of Completely Automated Public Turing test to tell Computers and Humans Apart (CAPTCHA), so it is extremely important to understand which CAPTCHA schemes have been designed and their actual effectiveness against the ever-evolving bots. To this end, this work provides an overview of the current state-of-the-art in the field of CAPTCHA schemes and defines a new classification that includes all the emerging schemes. In addition, for each identified CAPTCHA category, the most successful attack methods are summarized by also describing how CAPTCHA schemes evolved to resist bot attacks, and discussing the limitations of different CAPTCHA schemes from the security, usability and compatibility point of view. Finally, an assessment of the open issues, challenges, and opportunities for further study is provided, paving the road toward the design of the next-generation secure and user-friendly CAPTCHA schemes.