AIOct 26, 2023
Exploring the Potential of Generative AI for the World Wide WebNouar AlDahoul, Joseph Hong, Matteo Varvello et al.
Generative Artificial Intelligence (AI) is a cutting-edge technology capable of producing text, images, and various media content leveraging generative models and user prompts. Between 2022 and 2023, generative AI surged in popularity with a plethora of applications spanning from AI-powered movies to chatbots. In this paper, we delve into the potential of generative AI within the realm of the World Wide Web, specifically focusing on image generation. Web developers already harness generative AI to help crafting text and images, while Web browsers might use it in the future to locally generate images for tasks like repairing broken webpages, conserving bandwidth, and enhancing privacy. To explore this research area, we have developed WebDiffusion, a tool that allows to simulate a Web powered by stable diffusion, a popular text-to-image model, from both a client and server perspective. WebDiffusion further supports crowdsourcing of user opinions, which we use to evaluate the quality and accuracy of 409 AI-generated images sourced from 60 webpages. Our findings suggest that generative AI is already capable of producing pertinent and high-quality Web images, even without requiring Web designers to manually input prompts, just by leveraging contextual information available within the webpages. However, we acknowledge that direct in-browser image generation remains a challenge, as only highly powerful GPUs, such as the A40 and A100, can (partially) compete with classic image downloads. Nevertheless, this approach could be valuable for a subset of the images, for example when fixing broken webpages or handling highly private content.
CYMay 7, 2025
Large Language Models are often politically extreme, usually ideologically inconsistent, and persuasive even in informational contextsNouar Aldahoul, Hazem Ibrahim, Matteo Varvello et al.
Large Language Models (LLMs) are a transformational technology, fundamentally changing how people obtain information and interact with the world. As people become increasingly reliant on them for an enormous variety of tasks, a body of academic research has developed to examine these models for inherent biases, especially political biases, often finding them small. We challenge this prevailing wisdom. First, by comparing 31 LLMs to legislators, judges, and a nationally representative sample of U.S. voters, we show that LLMs' apparently small overall partisan preference is the net result of offsetting extreme views on specific topics, much like moderate voters. Second, in a randomized experiment, we show that LLMs can promulgate their preferences into political persuasiveness even in information-seeking contexts: voters randomized to discuss political issues with an LLM chatbot are as much as 5 percentage points more likely to express the same preferences as that chatbot. Contrary to expectations, these persuasive effects are not moderated by familiarity with LLMs, news consumption, or interest in politics. LLMs, especially those controlled by private companies or governments, may become a powerful and targeted vector for political influence.
CLAug 25, 2025
Not All Visitors are Bilingual: A Measurement Study of the Multilingual Web from an Accessibility PerspectiveMasudul Hasan Masud Bhuiyan, Matteo Varvello, Yasir Zaki et al.
English is the predominant language on the web, powering nearly half of the world's top ten million websites. Support for multilingual content is nevertheless growing, with many websites increasingly combining English with regional or native languages in both visible content and hidden metadata. This multilingualism introduces significant barriers for users with visual impairments, as assistive technologies like screen readers frequently lack robust support for non-Latin scripts and misrender or mispronounce non-English text, compounding accessibility challenges across diverse linguistic contexts. Yet, large-scale studies of this issue have been limited by the lack of comprehensive datasets on multilingual web content. To address this gap, we introduce LangCrUX, the first large-scale dataset of 120,000 popular websites across 12 languages that primarily use non-Latin scripts. Leveraging this dataset, we conduct a systematic analysis of multilingual web accessibility and uncover widespread neglect of accessibility hints. We find that these hints often fail to reflect the language diversity of visible content, reducing the effectiveness of screen readers and limiting web accessibility. We finally propose Kizuki, a language-aware automated accessibility testing extension to account for the limited utility of language-inconsistent accessibility hints.
PFFeb 13, 2025
PixLift: Accelerating Web Browsing via AI UpscalingYonas Atinafu, Sarthak Malla, HyunSeok Daniel Jang et al.
Accessing the internet in regions with expensive data plans and limited connectivity poses significant challenges, restricting information access and economic growth. Images, as a major contributor to webpage sizes, exacerbate this issue, despite advances in compression formats like WebP and AVIF. The continued growth of complex and curated web content, coupled with suboptimal optimization practices in many regions, has prevented meaningful reductions in web page sizes. This paper introduces PixLift, a novel solution to reduce webpage sizes by downscaling their images during transmission and leveraging AI models on user devices to upscale them. By trading computational resources for bandwidth, PixLift enables more affordable and inclusive web access. We address key challenges, including the feasibility of scaled image requests on popular websites, the implementation of PixLift as a browser extension, and its impact on user experience. Through the analysis of 71.4k webpages, evaluations of three mainstream upscaling models, and a user study, we demonstrate PixLift's ability to significantly reduce data usage without compromising image quality, fostering a more equitable internet.
CRSep 7, 2021
POW-HOW: An enduring timing side-channel to evade online malware sandboxesAntonio Nappa, Panagiotis Papadopoulos, Matteo Varvello et al.
Online malware scanners are one of the best weapons in the arsenal of cybersecurity companies and researchers. A fundamental part of such systems is the sandbox that provides an instrumented and isolated environment (virtualized or emulated) for any user to upload and run unknown artifacts and identify potentially malicious behaviors. The provided API and the wealth of information inthe reports produced by these services have also helped attackers test the efficacy of numerous techniques to make malware hard to detect.The most common technique used by malware for evading the analysis system is to monitor the execution environment, detect the presence of any debugging artifacts, and hide its malicious behavior if needed. This is usually achieved by looking for signals suggesting that the execution environment does not belong to a the native machine, such as specific memory patterns or behavioral traits of certain CPU instructions. In this paper, we show how an attacker can evade detection on such online services by incorporating a Proof-of-Work (PoW) algorithm into a malware sample. Specifically, we leverage the asymptotic behavior of the computational cost of PoW algorithms when they run on some classes of hardware platforms to effectively detect a non bare-metal environment of the malware sandbox analyzer. To prove the validity of this intuition, we design and implement the POW-HOW framework, a tool to automatically implement sandbox detection strategies and embed a test evasion program into an arbitrary malware sample. Our empirical evaluation shows that the proposed evasion technique is durable, hard to fingerprint, and reduces existing malware detection rate by a factor of 10. Moreover, we show how bare-metal environments cannot scale with actual malware submissions rates for consumer services.
SEJun 15, 2021
Muzeel: A Dynamic JavaScript Analyzer for Dead Code Elimination in Today's WebTofunmi Kupoluyi, Moumena Chaqfeh, Matteo Varvello et al.
JavaScript contributes to the increasing complexity of today's web. To support user interactivity and accelerate the development cycle, web developers heavily rely on large general-purpose third-party JavaScript libraries. This practice increases the size and the processing complexity of a web page by bringing additional functions that are not used by the page but unnecessarily downloaded and processed by the browser. In this paper, an analysis of around 40,000 web pages shows that 70% of JavaScript functions on the median page are unused, and the elimination of these functions would contribute to the reduction of the page size by 60%. Motivated by these findings, we propose Muzeel (which means eliminator in Arabic); a solution for eliminating JavaScript functions that are not used in a given web page (commonly referred to as dead code). Muzeel extracts all of the page event listeners upon page load, and emulates user interactions using a bot that triggers each of these events, in order to eliminate the dead code of functions that are not called by any of these events. Our evaluation results spanning several Android mobile phones and browsers show that Muzeel speeds up the page load by around 30% on low-end phones, and by 25% on high-end phones under 3G network. It also reduces the speed index (which is an important user experience metric) by 23% and 21% under the same network on low-end, and high-end phones, respectively. Additionally, Muzeel reduces the overall download size while maintaining the visual content and interactive functionality of the pages.
CRApr 28, 2020
A Retrospective Analysis of User Exposure to (Illicit) Cryptocurrency Mining on the WebRalph Holz, Diego Perino, Matteo Varvello et al.
In late 2017, a sudden proliferation of malicious JavaScript was reported on the Web: browser-based mining exploited the CPU time of website visitors to mine the cryptocurrency Monero. Several studies measured the deployment of such code and developed defenses. However, previous work did not establish how many users were really exposed to the identified mining sites and whether there was a real risk given common user browsing behavior. In this paper, we present a retroactive analysis to close this research gap. We pool large-scale, longitudinal data from several vantage points, gathered during the prime time of illicit cryptomining, to measure the impact on web users. We leverage data from passive traffic monitoring of university networks and a large European ISP, with suspected mining sites identified in previous active scans. We corroborate our results with data from a browser extension with a large user base that tracks site visits. We also monitor open HTTP proxies and the Tor network for malicious injection of code. We find that the risk for most Web users was always very low, much lower than what deployment scans suggested. Any exposure period was also very brief. However, we also identify a previously unknown and exploited attack vector on mobile devices.
CRNov 18, 2019
ZKSENSE: A Friction-less Privacy-Preserving Human Attestation Mechanism for Mobile DevicesIñigo Querejeta-Azurmendi, Panagiotis Papadopoulos, Matteo Varvello et al.
Recent studies show that 20.4% of the internet traffic originates from automated agents. To identify and block such ill-intentioned traffic, mechanisms that verify the humanness of the user are widely deployed, with CAPTCHAs being the most popular. Traditional CAPTCHAs require extra user effort (e.g., solving mathematical puzzles), which can severely downgrade the end-user's experience, especially on mobile, and provide sporadic humanness verification of questionable accuracy. More recent solutions like Google's reCAPTCHA v3, leverage user data, thus raising significant privacy concerns. To address these issues, we present zkSENSE: the first zero-knowledge proof-based humanness attestation system for mobile devices. zkSENSE moves the human attestation to the edge: onto the user's very own device, where humanness of the user is assessed in a privacy-preserving and seamless manner. zkSENSE achieves this by classifying motion sensor outputs of the mobile device, based on a model trained by using both publicly available sensor data and data collected from a small group of volunteers. To ensure the integrity of the process, the classification result is enclosed in a zero-knowledge proof of humanness that can be safely shared with a remote server. We implement zkSENSE as an Android service to demonstrate its effectiveness and practicality. In our evaluation, we show that zkSENSE successfully verifies the humanness of a user across a variety of attacking scenarios and demonstrates 92% accuracy. On a two years old Samsung S9, zkSENSE's attestation takes around 3 seconds (when visual CAPTCHAs need 9.8 seconds) and consumes a negligible amount of battery.
NIOct 1, 2019
VPN0: A Privacy-Preserving Decentralized Virtual Private NetworkMatteo Varvello, Iñigo Querejeta Azurmendi, Antonio Nappa et al.
Distributed Virtual Private Networks (dVPNs) are new VPN solutions aiming to solve the trust-privacy concern of a VPN's central authority by leveraging a distributed architecture. In this paper, we first review the existing dVPN ecosystem and debate on its privacy requirements. Then, we present VPN0, a dVPN with strong privacy guarantees and minimal performance impact on its users. VPN0 guarantees that a dVPN node only carries traffic it has "whitelisted", without revealing its whitelist or knowing the traffic it tunnels. This is achieved via three main innovations. First, an attestation mechanism which leverages TLS to certify a user visit to a specific domain. Second, a zero knowledge proof to certify that some incoming traffic is authorized, e.g., falls in a node's whitelist, without disclosing the target domain. Third, a dynamic chain of VPN tunnels to both increase privacy and guarantee service continuation while traffic certification is in place. The paper demonstrates VPN0 functioning when integrated with several production systems, namely BitTorrent DHT and ProtonVPN.
NIFeb 7, 2019
EYEORG: A Platform For Crowdsourcing Web Quality Of Experience MeasurementsMatteo Varvello, Jeremy Blackburn, David Naylor et al.
Tremendous effort has gone into the ongoing battle to make webpages load faster. This effort has culminated in new protocols (QUIC, SPDY, and HTTP/2) as well as novel content delivery mechanisms. In addition, companies like Google and SpeedCurve investigated how to measure "page load time" (PLT) in a way that captures human perception. In this paper we present Eyeorg, a platform for crowdsourcing web quality of experience measurements. Eyeorg overcomes the scaling and automation challenges of recruiting users and collecting consistent user-perceived quality measurements. We validate Eyeorg's capabilities via a set of 100 trusted participants. Next, we showcase its functionalities via three measurement campaigns, each involving 1,000 paid participants, to 1) study the quality of several PLT metrics, 2) compare HTTP/1.1 and HTTP/2 performance, and 3) assess the impact of online advertisements and ad blockers on user experience. We find that commonly used, and even novel and sophisticated PLT metrics fail to represent actual human perception of PLT, that the performance gains from HTTP/2 are imperceivable in some circumstances, and that not all ad blockers are created equal.