Panagiotis Papadopoulos

CR
18papers
469citations
Novelty55%
AI Score28

18 Papers

LGFeb 26, 2023
P4L: Privacy Preserving Peer-to-Peer Learning for Infrastructureless Setups

Ioannis Arapakis, Panagiotis Papadopoulos, Kleomenis Katevas et al.

Distributed (or Federated) learning enables users to train machine learning models on their very own devices, while they share only the gradients of their models usually in a differentially private way (utility loss). Although such a strategy provides better privacy guarantees than the traditional centralized approach, it requires users to blindly trust a centralized infrastructure that may also become a bottleneck with the increasing number of users. In this paper, we design and implement P4L: a privacy preserving peer-to-peer learning system for users to participate in an asynchronous, collaborative learning scheme without requiring any sort of infrastructure or relying on differential privacy. Our design uses strong cryptographic primitives to preserve both the confidentiality and utility of the shared gradients, a set of peer-to-peer mechanisms for fault tolerance and user churn, proximity and cross device communications. Extensive simulations under different network settings and ML scenarios for three real-life datasets show that P4L provides competitive performance to baselines, while it is resilient to different poisoning attacks. We implement P4L and experimental results show that the performance overhead and power consumption is minimal (less than 3mAh of discharge).

CYDec 13, 2022
FNDaaS: Content-agnostic Detection of Fake News sites

Panagiotis Papadopoulos, Dimitris Spithouris, Evangelos P. Markatos et al.

Automatic fake news detection is a challenging problem in misinformation spreading, and it has tremendous real-world political and social impacts. Past studies have proposed machine learning-based methods for detecting such fake news, focusing on different properties of the published news articles, such as linguistic characteristics of the actual content, which however have limitations due to the apparent language barriers. Departing from such efforts, we propose Fake News Detection-as-a Service (FNDaaS), the first automatic, content-agnostic fake news detection method, that considers new and unstudied features such as network and structural characteristics per news website. This method can be enforced as-a-Service, either at the ISP-side for easier scalability and maintenance, or user-side for better end-user privacy. We demonstrate the efficacy of our method using more than 340K datapoints crawled from existing lists of 637 fake and 1183 real news websites, and by building and testing a proof of concept system that materializes our proposal. Our analysis of data collected from these websites shows that the vast majority of fake news domains are very young and appear to have lower time periods of an IP associated with their domain than real news ones. By conducting various experiments with machine learning classifiers, we demonstrate that FNDaaS can achieve an AUC score of up to 0.967 on past sites, and up to 77-92% accuracy on newly-flagged ones.

CRSep 7, 2021
POW-HOW: An enduring timing side-channel to evade online malware sandboxes

Antonio Nappa, Panagiotis Papadopoulos, Matteo Varvello et al.

Online malware scanners are one of the best weapons in the arsenal of cybersecurity companies and researchers. A fundamental part of such systems is the sandbox that provides an instrumented and isolated environment (virtualized or emulated) for any user to upload and run unknown artifacts and identify potentially malicious behaviors. The provided API and the wealth of information inthe reports produced by these services have also helped attackers test the efficacy of numerous techniques to make malware hard to detect.The most common technique used by malware for evading the analysis system is to monitor the execution environment, detect the presence of any debugging artifacts, and hide its malicious behavior if needed. This is usually achieved by looking for signals suggesting that the execution environment does not belong to a the native machine, such as specific memory patterns or behavioral traits of certain CPU instructions. In this paper, we show how an attacker can evade detection on such online services by incorporating a Proof-of-Work (PoW) algorithm into a malware sample. Specifically, we leverage the asymptotic behavior of the computational cost of PoW algorithms when they run on some classes of hardware platforms to effectively detect a non bare-metal environment of the malware sandbox analyzer. To prove the validity of this intuition, we design and implement the POW-HOW framework, a tool to automatically implement sandbox detection strategies and embed a test evasion program into an arbitrary malware sample. Our empirical evaluation shows that the proposed evasion technique is durable, hard to fingerprint, and reduces existing malware detection rate by a factor of 10. Moreover, we show how bare-metal environments cannot scale with actual malware submissions rates for consumer services.

CRJun 3, 2021
THEMIS: A Decentralized Privacy-Preserving Ad Platform with Reporting Integrity

Gonçalo Pestana, Iñigo Querejeta-Azurmendi, Panagiotis Papadopoulos et al.

Online advertising fuels the (seemingly) free internet. However, although users can access most of the web services free of charge, they pay a heavy coston their privacy. They are forced to trust third parties and intermediaries, who not only collect behavioral data but also absorb great amounts of ad revenues. Consequently, more and more users opt out from advertising by resorting to ad blockers, thus costing publishers millions of dollars in lost ad revenues. Albeit there are various privacy-preserving advertising proposals (e.g.,Adnostic, Privad, Brave Ads) from both academia and industry, they all rely on centralized management that users have to blindly trust without being able to audit, while they also fail to guarantee the integrity of the per-formance analytics they provide to advertisers. In this paper, we design and deploy THEMIS, a novel, decentralized and privacy-by-design ad platform that requires zero trust by users. THEMIS (i) provides auditability to its participants, (ii) rewards users for viewing ads, and (iii) allows advertisers to verify the performance and billing reports of their ad campaigns. By leveraging smart contracts and zero-knowledge schemes, we implement a prototype of THEMIS and early performance evaluation results show that it can scale linearly on a multi sidechain setup while it supports more than 51M users on a single-sidechain.

SIMar 16, 2021
The Rise and Fall of Fake News sites: A Traffic Analysis

Manolis Chalkiadakis, Alexandros Kornilakis, Panagiotis Papadopoulos et al.

Over the past decade, we have witnessed the rise of misinformation on the Internet, with online users constantly falling victims of fake news. A multitude of past studies have analyzed fake news diffusion mechanics and detection and mitigation techniques. However, there are still open questions about their operational behavior such as: How old are fake news websites? Do they typically stay online for long periods of time? Do such websites synchronize with each other their up and down time? Do they share similar content through time? Which third-parties support their operations? How much user traffic do they attract, in comparison to mainstream or real news websites? In this paper, we perform a first of its kind investigation to answer such questions regarding the online presence of fake news websites and characterize their behavior in comparison to real news websites. Based on our findings, we build a content-agnostic ML classifier for automatic detection of fake news websites (i.e. accuracy) that are not yet included in manually curated blacklists.

CYFeb 17, 2021
User Tracking in the Post-cookie Era: How Websites Bypass GDPR Consent to Track Users

Emmanouil Papadogiannakis, Panagiotis Papadopoulos, Nicolas Kourtellis et al.

During the past few years, mostly as a result of the GDPR and the CCPA, websites have started to present users with cookie consent banners. These banners are web forms where the users can state their preference and declare which cookies they would like to accept, if such option exists. Although requesting consent before storing any identifiable information is a good start towards respecting the user privacy, yet previous research has shown that websites do not always respect user choices. Furthermore, considering the ever decreasing reliance of trackers on cookies and actions browser vendors take by blocking or restricting third-party cookies, we anticipate a world where stateless tracking emerges, either because trackers or websites do not use cookies, or because users simply refuse to accept any. In this paper, we explore whether websites use more persistent and sophisticated forms of tracking in order to track users who said they do not want cookies. Such forms of tracking include first-party ID leaking, ID synchronization, and browser fingerprinting. Our results suggest that websites do use such modern forms of tracking even before users had the opportunity to register their choice with respect to cookies. To add insult to injury, when users choose to raise their voice and reject all cookies, user tracking only intensifies. As a result, users' choices play very little role with respect to tracking: we measured that more than 75% of tracking activities happened before users had the opportunity to make a selection in the cookie consent banner, or when users chose to reject all cookies.

CRJul 10, 2020
THEMIS: Decentralized and Trustless Ad Platform with Reporting Integrity

Gonçalo Pestana, Iñigo Querejeta-Azurmendi, Panagiotis Papadopoulos et al.

Online advertising fuels the (seemingly) free internet. However, although users can access most websites free of charge, they need to pay a heavy cost on their privacy and blindly trust third parties and intermediaries that absorb great amounts of adrevenues and user data. This is one of the reasons users opt out from advertising by resorting ad blockers thatin turn cost publishers millions of dollars in lost adrevenues. Existing privacy-preserving advertising approaches(e.g., Adnostic, Privad, Brave Ads) from both industry and academia cannot guarantee the integrity of the performance analytics they provide to advertisers, while they also rely on centralized management that users have to trust without being able to audit. In this paper, we propose THEMIS, a novel privacy-by-design ad platform that is decentralized and requires zero trust from users. THEMIS (i) provides auditability to all participants, (ii) rewards users for viewing ads, and (iii) allows advertisers to verify the performance and billing reports of their ad campaigns. To demonstrate the feasibility and practicability of our approach, we implemented a prototype of THEMIS using a combination of smart contracts and zero-knowledge schemes. Performance evaluation results show that during adreward payouts, THEMIS can support more than 51M users on a single-sidechain setup or 153M users ona multi-sidechain setup, thus proving that THEMIS scales linearly.

CYFeb 3, 2020
Stop Tracking Me Bro! Differential Tracking Of User Demographics On Hyper-partisan Websites

Pushkal Agarwal, Sagar Joglekar, Panagiotis Papadopoulos et al.

Websites with hyper-partisan, left or right-leaning focus offer content that is typically biased towards the expectations of their target audience. Such content often polarizes users, who are repeatedly primed to specific (extreme) content, usually reflecting hard party lines on political and socio-economic topics. Though this polarization has been extensively studied with respect to content, it is still unknown how it associates with the online tracking experienced by browsing users, especially when they exhibit certain demographic characteristics. For example, it is unclear how such websites enable the ad-ecosystem to track users based on their gender or age. In this paper, we take a first step to shed light and measure such potential differences in tracking imposed on users when visiting specific party-line's websites. For this, we design and deploy a methodology to systematically probe such websites and measure differences in user tracking. This methodology allows us to create user personas with specific attributes like gender and age and automate their browsing behavior in a consistent and repeatable manner. Thus, we systematically study how personas are being tracked by these websites and their third parties, especially if they exhibit particular demographic properties. Overall, we test 9 personas on 556 hyper-partisan websites and find that right-leaning websites tend to track users more intensely than left-leaning, depending on user demographics, using both cookies and cookie synchronization methods and leading to more costly delivered ads.

CRNov 18, 2019
ZKSENSE: A Friction-less Privacy-Preserving Human Attestation Mechanism for Mobile Devices

Iñigo Querejeta-Azurmendi, Panagiotis Papadopoulos, Matteo Varvello et al.

Recent studies show that 20.4% of the internet traffic originates from automated agents. To identify and block such ill-intentioned traffic, mechanisms that verify the humanness of the user are widely deployed, with CAPTCHAs being the most popular. Traditional CAPTCHAs require extra user effort (e.g., solving mathematical puzzles), which can severely downgrade the end-user's experience, especially on mobile, and provide sporadic humanness verification of questionable accuracy. More recent solutions like Google's reCAPTCHA v3, leverage user data, thus raising significant privacy concerns. To address these issues, we present zkSENSE: the first zero-knowledge proof-based humanness attestation system for mobile devices. zkSENSE moves the human attestation to the edge: onto the user's very own device, where humanness of the user is assessed in a privacy-preserving and seamless manner. zkSENSE achieves this by classifying motion sensor outputs of the mobile device, based on a model trained by using both publicly available sensor data and data collected from a small group of volunteers. To ensure the integrity of the process, the classification result is enclosed in a zero-knowledge proof of humanness that can be safely shared with a remote server. We implement zkSENSE as an Android service to demonstrate its effectiveness and practicality. In our evaluation, we show that zkSENSE successfully verifies the humanness of a user across a variety of attacking scenarios and demonstrates 92% accuracy. On a two years old Samsung S9, zkSENSE's attestation takes around 3 seconds (when visual CAPTCHAs need 9.8 seconds) and consumes a negligible amount of battery.

CRNov 6, 2019
The coin that never sleeps. The privacy preserving usage of Bitcoin in a longitudinal analysis as a speculative asset

Emmanouil Karampinakis, Michalis Pachilakis, Panagiotis Papadopoulos et al.

Bitcoin is the first and undoubtedly most successful cryptocurrecny to date with a market capitalization of more than 100 billion dollars. Today, Bitcoin has more than 100,000 supporting merchants and more than 3 million active users. Besides the trust it enjoys among people, Bitcoin lacks of a basic feature a substitute currency must have: stability of value. Hence, although the use of Bitcoin as a mean of payment is relative low, yet the wild ups and downs of its value lure investors to use it as useful asset to yield a trading profit. In this study, we explore this exact nature of Bitcoin aiming to shed light in the newly emerged and rapid growing marketplace of cryptocurencies and compare the investmet landscape and patterns with the most popular traditional stock market of Dow Jones. Our results show that most of Bitcoin addresses are used in the correct fashion to preserve security and privacy of the transactions and that the 24/7 open market of Bitcoin is not affected by any political incidents of the offline world, in contrary with the traditional stock markets. Also, it seems that there are specific longitudes that lead the cryptocurrency in terms of bulk of transactions, but there is not the same correlation with the volume of the coins being transferred.

CROct 16, 2019
Filter List Generation for Underserved Regions

Alexander Sjosten, Peter Snyder, Antonio Pastor et al.

Filter lists play a large and growing role in protecting and assisting web users. The vast majority of popular filter lists are crowd-sourced, where a large number of people manually label resources related to undesirable web resources (e.g., ads, trackers, paywall libraries), so that they can be blocked by browsers and extensions. Because only a small percentage of web users participate in the generation of filter lists, a crowd-sourcing strategy works well for blocking either uncommon resources that appear on "popular" websites, or resources that appear on a large number of "unpopular" websites. A crowd-sourcing strategy will perform poorly for parts of the web with small "crowds", such as regions of the web serving languages with (relatively) few speakers. This work addresses this problem through the combination of two novel techniques: (i) deep browser instrumentation that allows for the accurate generation of request chains, in a way that is robust in situations that confuse existing measurement techniques, and (ii) an ad classifier that uniquely combines perceptual and page-context features to remain accurate across multiple languages. We apply our unique two-step filter list generation pipeline to three regions of the web that currently have poorly maintained filter lists: Sri Lanka, Hungary, and Albania. We generate new filter lists that complement existing filter lists. Our complementary lists block an additional 3,349 of ad and ad-related resources (1,771 unique) when applied to 6,475 pages targeting these three regions. We hope that this work can be part of an increased effort at ensuring that the security, privacy, and performance benefits of web resource blocking can be shared with all users, and not only those in dominant linguistic or economic regions.

NIOct 1, 2019
VPN0: A Privacy-Preserving Decentralized Virtual Private Network

Matteo Varvello, Iñigo Querejeta Azurmendi, Antonio Nappa et al.

Distributed Virtual Private Networks (dVPNs) are new VPN solutions aiming to solve the trust-privacy concern of a VPN's central authority by leveraging a distributed architecture. In this paper, we first review the existing dVPN ecosystem and debate on its privacy requirements. Then, we present VPN0, a dVPN with strong privacy guarantees and minimal performance impact on its users. VPN0 guarantees that a dVPN node only carries traffic it has "whitelisted", without revealing its whitelist or knowing the traffic it tunnels. This is achieved via three main innovations. First, an attestation mechanism which leverages TLS to certify a user visit to a specific domain. Second, a zero knowledge proof to certify that some incoming traffic is authorized, e.g., falls in a node's whitelist, without disclosing the target domain. Third, a dynamic chain of VPN tunnels to both increase privacy and guarantee service continuation while traffic certification is in place. The paper demonstrates VPN0 functioning when integrated with several production systems, namely BitTorrent DHT and ProtonVPN.

CRJul 24, 2019
YourAdvalue: Measuring Advertising Price Dynamics without Bankrupting User Privacy

Michalis Pachilakis, Panagiotis Papadopoulos, Nikolaos Laoutaris et al.

The Real Time Bidding (RTB) protocol is by now more than a decade old. During this time, a handful of measurement papers have looked at bidding strategies, personal information flow, and cost of display advertising through RTB. In this paper, we present YourAdvalue, a privacy-preserving tool for displaying to end-users in a simple and intuitive manner their advertising value as seen through RTB. Using YourAdvalue, we measure desktop RTB prices in the wild, and compare them with desktop and mobile RTB prices reported by past work. We present how it estimates ad prices that are encrypted, and how it preserves user privacy while reporting results back to a data-server for analysis. We deployed our system, disseminated its browser extension, and collected data from 200 users, including 12000 ad impressions over 11 months. By analyzing this dataset, we show that desktop RTB prices have grown 4.6X over desktop RTB prices measured in 2013, and 3.8X over mobile RTB prices measured in 2015. We also study how user demographics associate with the intensity of RTB ecosystem tracking, leading to higher ad prices. We find that exchanging data between advertisers and/or data brokers through cookie-synchronization increases the median value of displayed ads by 19%. We also find that female and younger users are more targeted, suffering more tracking (via cookie synchronization) than male or elder users. As a result of this targeting in our dataset, the advertising value (i) of women is 2.4X higher than that of men, (ii) of 25-34 year-olds is 2.5X higher than that of 35-44 year-olds, (iii) is most expensive on weekends and early mornings.

CYFeb 18, 2019
Keeping out the Masses: Understanding the Popularity and Implications of Internet Paywalls

Panagiotis Papadopoulos, Peter Snyder, Dimitrios Athanasakis et al.

Funding the production of quality online content is a pressing problem for content producers. The most common funding method, online advertising, is rife with well-known performance and privacy harms, and an intractable subject-agent conflict: many users do not want to see advertisements, depriving the site of needed funding. Because of these negative aspects of advertisement-based funding, paywalls are an increasingly popular alternative for websites. This shift to a "pay-for-access" web is one that has potentially huge implications for the web and society. Instead of a system where information (nominally) flows freely, paywalls create a web where high quality information is available to fewer and fewer people, leaving the rest of the web users with less information, that might be also less accurate and of lower quality. Despite the potential significance of a move from an "advertising-but-open" web to a "paywalled" web, we find this issue understudied. This work addresses this gap in our understanding by measuring how widely paywalls have been adopted, what kinds of sites use paywalls, and the distribution of policies enforced by paywalls. A partial list of our findings include that (i) paywall use is accelerating (2x more paywalls every 6 months), (ii) paywall adoption differs by country (e.g. 18.75% in US, 12.69% in Australia), (iii) paywalls change how users interact with sites (e.g. higher bounce rates, less incoming links), (iv) the median cost of an annual paywall access is $108 per site, and (v) paywalls are in general trivial to circumvent. Finally, we present the design of a novel, automated system for detecting whether a site uses a paywall, through the combination of runtime browser instrumentation and repeated programmatic interactions with the site. We intend this classifier to augment future, longitudinal measurements of paywall use and behavior.

CRSep 30, 2018
Master of Web Puppets: Abusing Web Browsers for Persistent and Stealthy Computation

Panagiotis Papadopoulos, Panagiotis Ilia, Michalis Polychronakis et al.

The proliferation of web applications has essentially transformed modern browsers into small but powerful operating systems. Upon visiting a website, user devices run implicitly trusted script code, the execution of which is confined within the browser to prevent any interference with the user's system. Recent JavaScript APIs, however, provide advanced capabilities that not only enable feature-rich web applications, but also allow attackers to perform malicious operations despite the confined nature of JavaScript code execution. In this paper, we demonstrate the powerful capabilities that modern browser APIs provide to attackers by presenting MarioNet: a framework that allows a remote malicious entity to control a visitor's browser and abuse its resources for unwanted computation or harmful operations, such as cryptocurrency mining, password-cracking, and DDoS. MarioNet relies solely on already available HTML5 APIs, without requiring the installation of any additional software. In contrast to previous browser-based botnets, the persistence and stealthiness characteristics of MarioNet allow the malicious computations to continue in the background of the browser even after the user closes the window or tab of the initial malicious website. We present the design, implementation, and evaluation of a prototype system, MarioNet, that is compatible with all major browsers, and discuss potential defense strategies to counter the threat of such persistent in-browser attacks. Our main goal is to raise awareness regarding this new class of attacks, and inform the design of future browser APIs so that they provide a more secure client-side environment for web applications.

CRJun 6, 2018
Truth in Web Mining: Measuring the Profitability and Cost of Cryptominers as a Web Monetization Model

Panagiotis Papadopoulos, Panagiotis Ilia, Evangelos P. Markatos

The recent advances of web-based cryptomining libraries along with the whopping market value of cryptocoins have convinced an increasing number of publishers to switch to web mining as a source of monetization for their websites. The conditions could not be better nowadays: the inevitable arms race between adblockers and advertisers is at its peak with publishers caught in the crossfire. But, can cryptomining be the next primary monetization model in the post advertising era of free Internet? In this paper, we respond to this exact question. In particular, we compare the profitability of cryptomining and advertising to assess the most advantageous option for a content provider. In addition, we measure the costs imposed to the user in each case with regards to power consumption, resources utilization, network traffic, device temperature and user experience. Our results show that cryptomining can surpass the profitability of advertising under specific circumstances, however users need to sustain a significant cost on their devices.

IRMay 26, 2018
Cookie Synchronization: Everything You Always Wanted to Know But Were Afraid to Ask

Panagiotis Papadopoulos, Nicolas Kourtellis, Evangelos P. Markatos

User data is the primary input of digital advertising, fueling the free Internet as we know it. As a result, web companies invest a lot in elaborate tracking mechanisms to acquire user data that can sell to data markets and advertisers. However, with same-origin policy, and cookies as a primary identification mechanism on the web, each tracker knows the same user with a different ID. To mitigate this, Cookie Synchronization (CSync) came to the rescue, facilitating an information sharing channel between third parties that may or not have direct access to the website the user visits. In the background, with CSync, they merge user data they own, but also reconstruct a user's browsing history, bypassing the same origin policy. In this paper, we perform a first to our knowledge in-depth study of CSync in the wild, using a year-long weblog from 850 real mobile users. Through our study, we aim to understand the characteristics of the CSync protocol and the impact it has on web users' privacy. For this, we design and implement CONRAD, a holistic mechanism to detect CSync events at real time, and the privacy loss on the user side, even when the synced IDs are obfuscated. Using CONRAD, we find that 97% of the regular web users are exposed to CSync: most of them within the first week of their browsing, and the median userID gets leaked, on average, to 3.5 different domains. Finally, we see that CSync increases the number of domains that track the user by a factor of 6.75.

GTJan 24, 2017
If you are not paying for it, you are the product: How much do advertisers pay to reach you?

Panagiotis Papadopoulos, Nicolas Kourtellis, Pablo Rodriguez Rodriguez et al.

Online advertising is progressively moving towards a programmatic model in which ads are matched to actual interests of individuals collected as they browse the web. Letting the huge debate around privacy aside, a very important question in this area, for which little is known, is: How much do advertisers pay to reach an individual? In this study, we develop a first of its kind methodology for computing exactly that -- the price paid for a web user by the ad ecosystem -- and we do that in real time. Our approach is based on tapping on the Real Time Bidding (RTB) protocol to collect cleartext and encrypted prices for winning bids paid by advertisers in order to place targeted ads. Our main technical contribution is a method for tallying winning bids even when they are encrypted. We achieve this by training a model using as ground truth prices obtained by running our own "probe" ad-campaigns. We design our methodology through a browser extension and a back-end server that provides it with fresh models for encrypted bids. We validate our methodology using a one year long trace of 1600 mobile users and demonstrate that it can estimate a user's advertising worth with more than 82% accuracy.