Benjamin Livshits

CR
30papers
1,212citations
Novelty56%
AI Score44

30 Papers

RMFeb 19, 2023
Auto.gov: Learning-based Governance for Decentralized Finance (DeFi)

Jiahua Xu, Yebo Feng, Daniel Perez et al.

Decentralized finance (DeFi) is an integral component of the blockchain ecosystem, enabling a range of financial activities through smart-contract-based protocols. Traditional DeFi governance typically involves manual parameter adjustments by protocol teams or token holder votes, and is thus prone to human bias and financial risks, undermining the system's integrity and security. While existing efforts aim to establish more adaptive parameter adjustment schemes, there remains a need for a governance model that is both more efficient and resilient to significant market manipulations. In this paper, we introduce "Auto$.$gov", a learning-based governance framework that employs a deep Qnetwork (DQN) reinforcement learning (RL) strategy to perform semi-automated, data-driven parameter adjustments. We create a DeFi environment with an encoded action-state space akin to the Aave lending protocol for simulation and testing purposes, where Auto$.$gov has demonstrated the capability to retain funds that would have otherwise been lost to price oracle attacks. In tests with real-world data, Auto$.$gov outperforms the benchmark approaches by at least 14% and the static baseline model by tenfold, in terms of the preset performance metric--protocol profitability. Overall, the comprehensive evaluations confirm that Auto$.$gov is more efficient and effective than traditional governance methods, thereby enhancing the security, profitability, and ultimately, the sustainability of DeFi protocols.

CRDec 12, 2021Code
Pool-Party: Exploiting Browser Resource Pools as Side-Channels for Web Tracking

Peter Snyder, Soroush Karami, Arthur Edelstein et al.

We identify class of covert channels in browsers that are not mitigated by current defenses, which we call "pool-party" attacks. Pool-party attacks allow sites to create covert channels by manipulating limited-but-unpartitioned resource pools. These class of attacks have been known, but in this work we show that they are both more prevalent, more practical for exploitation, and allow exploitation in more ways, than previously identified. These covert channels have sufficient bandwidth to pass cookies and identifiers across site boundaries under practical and real-world conditions. We identify pool-party attacks in all popular browsers, and show they are practical cross-site tracking techniques (i.e., attacks take 0.6s in Chrome and Edge, and 7s in Firefox and Tor Browser). In this paper we make the following contributions: first, we describe pool-party covert channel attacks that exploit limits in application-layer resource pools in browsers. Second, we demonstrate that pool-party attacks are practical, and can be used to track users in all popular browsers; we also share open source implementations of the attack and evaluate them through a representative web crawl. Third, we show that in Gecko based-browsers (including the Tor Browser) pool-party attacks can also be used for cross-profile tracking (e.g., linking user behavior across normal and private browsing sessions). Finally, we discuss possible mitigation strategies and defenses

CRSep 21, 2021Code
STAR: Secret Sharing for Private Threshold Aggregation Reporting

Alex Davidson, Peter Snyder, E. B. Quirk et al.

Threshold aggregation reporting systems promise a practical, privacy-preserving solution for developers to learn how their applications are used "\emph{in-the-wild}". Unfortunately, proposed systems to date prove impractical for wide scale adoption, suffering from a combination of requiring: \emph{i)} prohibitive trust assumptions; \emph{ii)} high computation costs; or \emph{iii)} massive user bases. As a result, adoption of truly-private approaches has been limited to only a small number of enormous (and enormously costly) projects. In this work, we improve the state of private data collection by proposing $\mathsf{STAR}$, a highly efficient, easily deployable system for providing cryptographically-enforced $κ$-anonymity protections on user data collection. The $\mathsf{STAR}$ protocol is easy to implement and cheap to run, all while providing privacy properties similar to, or exceeding the current state-of-the-art. Measurements of our open-source implementation of $\mathsf{STAR}$ find that it is $1773\times$ quicker, requires $62.4\times$ less communication, and is $24\times$ cheaper to run than the existing state-of-the-art.

CRNov 2, 2020Code
There's No Trick, Its Just a Simple Trick: A Web-Compat and Privacy Improving Approach to Third-party Web Storage

Jordan Jueckstock, Peter Snyder, Shaown Sarker et al.

While much current web privacy research focuses on browser fingerprinting, the boring fact is that the majority of current third-party web tracking is conducted using traditional, persistent-state identifiers. One possible explanation for the privacy community's focus on fingerprinting is that to date browsers have faced a lose-lose dilemma when dealing with third-party stateful identifiers: block state in third-party frames and break a significant number of webpages, or allow state in third-party frames and enable pervasive tracking. The alternative, middle-ground solutions that have been deployed all trade privacy for compatibility, rely on manually curated lists, or depend on the user to manage state and state-access themselves. This work furthers privacy on the web by presenting a novel system for managing the lifetime of third-party storage, "page-length storage". We compare page-length storage to existing approaches for managing third-party state and find that page-length storage has the privacy protections of the most restrictive current option (i.e., blocking third-party storage) but web-compatibility properties mostly similar to the least restrictive option (i.e., allowing all third-party storage). This work further compares page-length storage to an alternative third-party storage partitioning scheme and finds that page-length storage provides superior privacy protections with comparable web-compatibility. We provide a dataset of the privacy and compatibility behaviors observed when applying the compared third-party storage strategies on a crawl of the Tranco 1k and the quantitative metrics used to demonstrate that page-length storage matches or surpasses existing approaches. Finally, we provide an open-source implementation of our page-length storage approach, implemented as patches against Chromium.

CRFeb 9
LLMs + Security = Trouble

Benjamin Livshits

We argue that when it comes to producing secure code with AI, the prevailing "fighting fire with fire" approach -- using probabilistic AI-based checkers or attackers to secure probabilistically generated code -- fails to address the long tail of security bugs. As a result, systems may remain exposed to zero-day vulnerabilities that can be discovered by better-resourced or more persistent adversaries. While neurosymbolic approaches that combine LLMs with formal methods are attractive in principle, we argue that they are difficult to reconcile with the "vibe coding" workflow common in LLM-assisted development: unless the end-to-end verification pipeline is fully automated, developers are repeatedly asked to validate specifications, resolve ambiguities, and adjudicate failures, making the human-in-the-loop a likely point of weakness, compromising secure-by-construction guarantees. In this paper we argue that stronger security guarantees can be obtained by enforcing security constraints during code generation (e.g., via constrained decoding), rather than relying solely on post-hoc detection and repair. This direction is particularly promising for diffusion-style code models, whose approach provides a natural elegant opportunity for modular, hierarchical security enforcement, allowing us to combine lower-latency generation techniques with generating secure-by-construction code.

CRSep 23, 2021
Towards Private On-Chain Algorithmic Trading

Ceren Kocaoğullar, Arthur Gervais, Benjamin Livshits

While quantitative automation related to trading crypto-assets such as ERC-20 tokens has become relatively commonplace, with services such as 3Commas and Shrimpy offering user-friendly web-driven services for even the average crypto trader, we have not yet seen the emergence of on-chain trading as a phenomenon. We hypothesize that just like decentralized exchanges (DEXes) that by now are by some measures more popular than traditional exchanges, process in the space of decentralized finance (DeFi) may enable attractive online trading automation options. In this paper we present ChainBot, an approach for creating algorithmic trading bots with the help of blockchain technology. We show how to partition the computation into on- and off-chain components in a way that provides a measure of end-to-end integrity, while preserving the algorithmic "secret sauce". Our system is enabled with a careful use of algorithm partitioning, zero-knowledge proofs and smart contracts. We also show that with layer-2 (L2) technologies, trades can be kept private, which means that algorithmic parameters are difficult to recover by a chain observer. Our approach offers more transparent access to liquidity and better censorship-resistance compared to traditional off-chain trading approaches. We develop a sample ChainBot and train it on historical data, resulting in returns that are up to 2.4x the buy-and-hold strategy, which we use as our baseline. Our measurements show that across 1000 runs, the end-to-end average execution time for our system is 48.4 seconds. We demonstrate that the frequency of trading does not significantly affect the rate of return and Sharpe ratio, which indicates that we do not have to trade at every block, thereby significantly saving in terms of gas fees. In our implementation, a user who invests \$1,000 would earn \$105, and spend \$3 on gas; assuming a user pool of 1,000 subscribers.

CRSep 16, 2021
PrivateFetch: Scalable Catalog Delivery in Privacy-Preserving Advertising

Muhammad Haris Mughees, Gonçalo Pestana, Alex Davidson et al.

In order to preserve the possibility of an Internet that is free at the point of use, attention is turning to new solutions that would allow targeted advertisement delivery based on behavioral information such as user preferences, without compromising user privacy. Recently, explorations in devising such systems either take approaches that rely on semantic guarantees like $k$-anonymity -- which can be easily subverted when combining with alternative information, and do not take into account the possibility that even knowledge of such clusters is privacy-invasive in themselves. Other approaches provide full privacy by moving all data and processing logic to clients -- but which is prohibitively expensive for both clients and servers. In this work, we devise a new framework called PrivateFetch for building practical ad-delivery pipelines that rely on cryptographic hardness and best-case privacy, rather than syntactic privacy guarantees or reliance on real-world anonymization tools. PrivateFetch utilizes local computation of preferences followed by high-performance single-server private information retrieval (PIR) to ensure that clients can pre-fetch ad content from servers, without revealing any of their inherent characteristics to the content provider. When considering an database of $>1,000,000$ ads, we show that we can deliver $30$ ads to a client in 40 seconds, with total communication costs of 192KB. We also demonstrate the feasibility of PrivateFetch by showing that the monetary cost of running it is less than 1% of average ad revenue. As such, our system is capable of pre-fetching ads for clients based on behavioral and contextual user information, before displaying them during a typical browsing session. In addition, while we test PrivateFetch as a private ad-delivery, the generality of our approach means that it could also be used for other content types.

CRSep 14, 2021
Security, Privacy, and Decentralization in Web3

Philipp Winter, Anna Harbluk Lorimer, Peter Snyder et al.

Much of the recent excitement around decentralized finance (DeFi) comes from hopes that DeFi can be a secure, private, less centralized alternative to traditional finance systems. However, people moving to DeFi sites in hopes of improving their security and privacy may end up with less of both as recent attacks have demonstrated. In this work, we improve the understanding of DeFi by conducting the first Web measurements of the security, privacy, and decentralization properties of popular DeFi front ends. We find that DeFi applications -- or dapps -- suffer from the same security and privacy risks that frequent other parts of the Web but those risks are greatly exacerbated considering the money that is involved in DeFi. Our results show that a common tracker can observe user behavior on over 56% of websites we analyzed and many trackers on DeFi sites can trivially link a user's Ethereum address with PII (e.g., user name or demographic information), or phish users by initiating fake Ethereum transactions. Lastly, we establish that despite claims to the opposite, because of companies like Amazon and Cloudflare operating significant Web infrastructure, DeFi as a whole is considerably less decentralized than previously believed.

CRJun 3, 2021
THEMIS: A Decentralized Privacy-Preserving Ad Platform with Reporting Integrity

Gonçalo Pestana, Iñigo Querejeta-Azurmendi, Panagiotis Papadopoulos et al.

Online advertising fuels the (seemingly) free internet. However, although users can access most of the web services free of charge, they pay a heavy coston their privacy. They are forced to trust third parties and intermediaries, who not only collect behavioral data but also absorb great amounts of ad revenues. Consequently, more and more users opt out from advertising by resorting to ad blockers, thus costing publishers millions of dollars in lost ad revenues. Albeit there are various privacy-preserving advertising proposals (e.g.,Adnostic, Privad, Brave Ads) from both academia and industry, they all rely on centralized management that users have to blindly trust without being able to audit, while they also fail to guarantee the integrity of the per-formance analytics they provide to advertisers. In this paper, we design and deploy THEMIS, a novel, decentralized and privacy-by-design ad platform that requires zero trust by users. THEMIS (i) provides auditability to its participants, (ii) rewards users for viewing ads, and (iii) allows advertisers to verify the performance and billing reports of their ad campaigns. By leveraging smart contracts and zero-knowledge schemes, we implement a prototype of THEMIS and early performance evaluation results show that it can scale linearly on a multi sidechain setup while it supports more than 51M users on a single-sidechain.

LGMay 9, 2021
Stronger Privacy for Federated Collaborative Filtering with Implicit Feedback

Lorenzo Minto, Moritz Haller, Hamed Haddadi et al.

Recommender systems are commonly trained on centrally collected user interaction data like views or clicks. This practice however raises serious privacy concerns regarding the recommender's collection and handling of potentially sensitive data. Several privacy-aware recommender systems have been proposed in recent literature, but comparatively little attention has been given to systems at the intersection of implicit feedback and privacy. To address this shortcoming, we propose a practical federated recommender system for implicit data under user-level local differential privacy (LDP). The privacy-utility trade-off is controlled by parameters $ε$ and $k$, regulating the per-update privacy budget and the number of $ε$-LDP gradient updates sent by each user respectively. To further protect the user's privacy, we introduce a proxy network to reduce the fingerprinting surface by anonymizing and shuffling the reports before forwarding them to the recommender. We empirically demonstrate the effectiveness of our framework on the MovieLens dataset, achieving up to Hit Ratio with K=10 (HR@10) 0.68 on 50k users with 5k items. Even on the full dataset, we show that it is possible to achieve reasonable utility with HR@10>0.5 without compromising user privacy.

CRMar 3, 2021
On the Just-In-Time Discovery of Profit-Generating Transactions in DeFi Protocols

Liyi Zhou, Kaihua Qin, Antoine Cully et al.

In this paper, we investigate two methods that allow us to automatically create profitable DeFi trades, one well-suited to arbitrage and the other applicable to more complicated settings. We first adopt the Bellman-Ford-Moore algorithm with DEFIPOSER-ARB and then create logical DeFi protocol models for a theorem prover in DEFIPOSER-SMT. While DEFIPOSER-ARB focuses on DeFi transactions that form a cycle and performs very well for arbitrage, DEFIPOSER-SMT can detect more complicated profitable transactions. We estimate that DEFIPOSER-ARB and DEFIPOSER-SMT can generate an average weekly revenue of 191.48ETH (76,592USD) and 72.44ETH (28,976USD) respectively, with the highest transaction revenue being 81.31ETH(32,524USD) and22.40ETH (8,960USD) respectively. We further show that DEFIPOSER-SMT finds the known economic bZx attack from February 2020, which yields 0.48M USD. Our forensic investigations show that this opportunity existed for 69 days and could have yielded more revenue if exploited one day earlier. Our evaluation spans 150 days, given 96 DeFi protocol actions, and 25 assets. Looking beyond the financial gains mentioned above, forks deteriorate the blockchain consensus security, as they increase the risks of double-spending and selfish mining. We explore the implications of DEFIPOSER-ARB and DEFIPOSER-SMT on blockchain consensus. Specifically, we show that the trades identified by our tools exceed the Ethereum block reward by up to 874x. Given optimal adversarial strategies provided by a Markov Decision Process (MDP), we quantify the value threshold at which a profitable transaction qualifies as Miner ExtractableValue (MEV) and would incentivize MEV-aware miners to fork the blockchain. For instance, we find that on Ethereum, a miner with a hash rate of 10% would fork the blockchain if an MEV opportunity exceeds 4x the block reward.

CRJul 10, 2020
THEMIS: Decentralized and Trustless Ad Platform with Reporting Integrity

Gonçalo Pestana, Iñigo Querejeta-Azurmendi, Panagiotis Papadopoulos et al.

Online advertising fuels the (seemingly) free internet. However, although users can access most websites free of charge, they need to pay a heavy cost on their privacy and blindly trust third parties and intermediaries that absorb great amounts of adrevenues and user data. This is one of the reasons users opt out from advertising by resorting ad blockers thatin turn cost publishers millions of dollars in lost adrevenues. Existing privacy-preserving advertising approaches(e.g., Adnostic, Privad, Brave Ads) from both industry and academia cannot guarantee the integrity of the performance analytics they provide to advertisers, while they also rely on centralized management that users have to trust without being able to audit. In this paper, we propose THEMIS, a novel privacy-by-design ad platform that is decentralized and requires zero trust from users. THEMIS (i) provides auditability to all participants, (ii) rewards users for viewing ads, and (iii) allows advertisers to verify the performance and billing reports of their ad campaigns. To demonstrate the feasibility and practicability of our approach, we implemented a prototype of THEMIS using a combination of smart contracts and zero-knowledge schemes. Performance evaluation results show that during adreward payouts, THEMIS can support more than 51M users on a single-sidechain setup or 153M users ona multi-sidechain setup, thus proving that THEMIS scales linearly.

CRMar 8, 2020
Attacking the DeFi Ecosystem with Flash Loans for Fun and Profit

Kaihua Qin, Liyi Zhou, Benjamin Livshits et al.

Credit allows a lender to loan out surplus capital to a borrower. In the traditional economy, credit bears the risk that the borrower may default on its debt, the lender hence requires upfront collateral from the borrower, plus interest fee payments. Due to the atomicity of blockchain transactions, lenders can offer flash loans, i.e., loans that are only valid within one transaction and must be repaid by the end of that transaction. This concept has lead to a number of interesting attack possibilities, some of which were exploited in February 2020. This paper is the first to explore the implication of transaction atomicity and flash loans for the nascent decentralized finance (DeFi) ecosystem. We show quantitatively how transaction atomicity increases the arbitrage revenue. We moreover analyze two existing attacks with ROIs beyond 500k%. We formulate finding the attack parameters as an optimization problem over the state of the underlying Ethereum blockchain and the state of the DeFi ecosystem. We show how malicious adversaries can efficiently maximize an attack profit and hence damage the DeFi ecosystem further. Specifically, we present how two previously executed attacks can be "boosted" to result in a profit of 829.5k USD and 1.1M USD, respectively, which is a boost of 2.37x and 1.73x, respectively.

CRMar 5, 2020
Revisiting Transactional Statistics of High-scalability Blockchains

Daniel Perez, Jiahua Xu, Benjamin Livshits

Scalability has been a bottleneck for major blockchains such as Bitcoin and Ethereum. Despite the significantly improved scalability claimed by several high--profile blockchain projects, there has been little effort to understand how their transactional throughput is being used. In this paper, we examine recent network traffic of three major high-scalability blockchains--EOSIO, Tezos and XRP Ledger (XRPL)--over a period of seven months. Our analysis reveals that only a small fraction of the transactions are used for value transfer purposes. In particular, 96% of the transactions on EOSIO were triggered by the airdrop of a currently valueless token; on Tezos, 76% of throughput was used for maintaining consensus; and over 94% of transactions on XRPL carried no economic value. We also identify a persisting airdrop on EOSIO as a DoS attack and detect a two-month-long spam attack on XRPL. The paper explores the different designs of the three blockchains and sheds light on how they could shape user behavior.

CRFeb 19, 2020
The Decentralized Financial Crisis

Lewis Gudgeon, Daniel Perez, Dominik Harz et al.

The Global Financial Crisis of 2008, caused by the accumulation of excessive financial risk, inspired Satoshi Nakamoto to create Bitcoin. Now, more than ten years later, Decentralized Finance (DeFi), a peer-to-peer financial paradigm which leverages blockchain-based smart contracts to ensure its integrity and security, contains over 702m USD of capital as of April 15th, 2020. As this ecosystem develops, it is at risk of the very sort of financial meltdown it is supposed to be preventing. In this paper we explore how design weaknesses and price fluctuations in DeFi protocols could lead to a DeFi crisis. We focus on DeFi lending protocols as they currently constitute most of the DeFi ecosystem with a 76% market share by capital as of April 15th, 2020. First, we demonstrate the feasibility of attacking Maker's governance design to take full control of the protocol, the largest DeFi protocol by market share, which would have allowed the theft of 0.5bn USD of collateral and the minting of an unlimited supply of DAI tokens. In doing so, we present a novel strategy utilizing so-called flash loans that would have in principle allowed the execution of the governance attack in just two transactions and without the need to lock any assets. Approximately two weeks after we disclosed the attack details, Maker modified the governance parameters mitigating the attack vectors. Second, we turn to a central component of financial risk in DeFi lending protocols. Inspired by stress-testing as performed by central banks, we develop a stress-testing framework for a stylized DeFi lending protocol, focusing our attention on the impact of a drying-up of liquidity on protocol solvency. Based on our parameters, we find that with sufficiently illiquidity a lending protocol with a total debt of 400m USD could become undercollateralized within 19 days.

CRNov 18, 2019
ZKSENSE: A Friction-less Privacy-Preserving Human Attestation Mechanism for Mobile Devices

Iñigo Querejeta-Azurmendi, Panagiotis Papadopoulos, Matteo Varvello et al.

Recent studies show that 20.4% of the internet traffic originates from automated agents. To identify and block such ill-intentioned traffic, mechanisms that verify the humanness of the user are widely deployed, with CAPTCHAs being the most popular. Traditional CAPTCHAs require extra user effort (e.g., solving mathematical puzzles), which can severely downgrade the end-user's experience, especially on mobile, and provide sporadic humanness verification of questionable accuracy. More recent solutions like Google's reCAPTCHA v3, leverage user data, thus raising significant privacy concerns. To address these issues, we present zkSENSE: the first zero-knowledge proof-based humanness attestation system for mobile devices. zkSENSE moves the human attestation to the edge: onto the user's very own device, where humanness of the user is assessed in a privacy-preserving and seamless manner. zkSENSE achieves this by classifying motion sensor outputs of the mobile device, based on a model trained by using both publicly available sensor data and data collected from a small group of volunteers. To ensure the integrity of the process, the classification result is enclosed in a zero-knowledge proof of humanness that can be safely shared with a remote server. We implement zkSENSE as an Android service to demonstrate its effectiveness and practicality. In our evaluation, we show that zkSENSE successfully verifies the humanness of a user across a variety of attacking scenarios and demonstrates 92% accuracy. On a two years old Samsung S9, zkSENSE's attestation takes around 3 seconds (when visual CAPTCHAs need 9.8 seconds) and consumes a negligible amount of battery.

CROct 16, 2019
Filter List Generation for Underserved Regions

Alexander Sjosten, Peter Snyder, Antonio Pastor et al.

Filter lists play a large and growing role in protecting and assisting web users. The vast majority of popular filter lists are crowd-sourced, where a large number of people manually label resources related to undesirable web resources (e.g., ads, trackers, paywall libraries), so that they can be blocked by browsers and extensions. Because only a small percentage of web users participate in the generation of filter lists, a crowd-sourcing strategy works well for blocking either uncommon resources that appear on "popular" websites, or resources that appear on a large number of "unpopular" websites. A crowd-sourcing strategy will perform poorly for parts of the web with small "crowds", such as regions of the web serving languages with (relatively) few speakers. This work addresses this problem through the combination of two novel techniques: (i) deep browser instrumentation that allows for the accurate generation of request chains, in a way that is robust in situations that confuse existing measurement techniques, and (ii) an ad classifier that uniquely combines perceptual and page-context features to remain accurate across multiple languages. We apply our unique two-step filter list generation pipeline to three regions of the web that currently have poorly maintained filter lists: Sri Lanka, Hungary, and Albania. We generate new filter lists that complement existing filter lists. Our complementary lists block an additional 3,349 of ad and ad-related resources (1,771 unique) when applied to 6,475 pages targeting these three regions. We hope that this work can be part of an increased effort at ensuring that the security, privacy, and performance benefits of web resource blocking can be shared with all users, and not only those in dominant linguistic or economic regions.

CRSep 16, 2019
Broken Metre: Attacking Resource Metering in EVM

Daniel Perez, Benjamin Livshits

Blockchain systems, such as Ethereum, use an approach called "metering" to assign a cost to smart contract execution, an approach which is designed to incentivise miners to operate the network and protect it against DoS attacks. In the past, the imperfections of Ethereum metering allowed several DoS attacks which were countered through modification of the metering mechanism. This paper presents a new DoS attack on Ethereum which systematically exploits its metering mechanism. We first replay and analyse several months of transactions, during which we discover a number of discrepancies in the metering model, such as significant inconsistencies in the pricing of the instructions. We further demonstrate that there is very little correlation between the execution cost and the utilised resources, such as CPU and memory. Based on these observations, we present a new type of DoS attack we call Resource Exhaustion Attack, which uses these imperfections to generate low-throughput contracts. To do this, we design a genetic algorithm that generates contracts with a throughput on average 200 times slower than typical contracts. We then show that all major Ethereum client implementations are vulnerable and, if running on commodity hardware, would be unable to stay in sync with the network when under attack. We argue that such an attack could be financially attractive not only for Ethereum competitors and speculators, but also for Ethereum miners. Finally, we discuss short-term and potential long-term fixes against such attacks. Our attack has been responsibly disclosed to the Ethereum Foundation and awarded a bug bounty reward of 5,000 USD.

LGSep 10, 2019
Privacy-Preserving Bandits

Mohammad Malekzadeh, Dimitrios Athanasakis, Hamed Haddadi et al.

Contextual bandit algorithms~(CBAs) often rely on personal data to provide recommendations. Centralized CBA agents utilize potentially sensitive data from recent interactions to provide personalization to end-users. Keeping the sensitive data locally, by running a local agent on the user's device, protects the user's privacy, however, the agent requires longer to produce useful recommendations, as it does not leverage feedback from other users. This paper proposes a technique we call Privacy-Preserving Bandits (P2B); a system that updates local agents by collecting feedback from other local agents in a differentially-private manner. Comparisons of our proposed approach with a non-private, as well as a fully-private (local) system, show competitive performance on both synthetic benchmarks and real-world data. Specifically, we observed only a decrease of 2.6% and 3.6% in multi-label classification accuracy, and a CTR increase of 0.0025 in online advertising for a privacy budget $ε\approx 0.693$. These results suggest P2B is an effective approach to challenges arising in on-device privacy-preserving personalization.

CRMay 17, 2019
Percival: Making In-Browser Perceptual Ad Blocking Practical With Deep Learning

Zain ul abi Din, Panagiotis Tigas, Samuel T. King et al.

In this paper we present Percival, a browser-embedded, lightweight, deep learning-powered ad blocker. Percival embeds itself within the browser's image rendering pipeline, which makes it possible to intercept every image obtained during page execution and to perform blocking based on applying machine learning for image classification to flag potential ads. Our implementation inside both Chromium and Brave browsers shows only a minor rendering performance overhead of 4.55%, demonstrating the feasibility of deploying traditionally heavy models (i.e. deep neural networks) inside the critical path of the rendering engine of a browser. We show that our image-based ad blocker can replicate EasyList rules with an accuracy of 96.76%. To show the versatility of the Percival's approach we present case studies that demonstrate that Percival 1) does surprisingly well on ads in languages other than English; 2) Percival also performs well on blocking first-party Facebook ads, which have presented issues for other ad blockers. Percival proves that image-based perceptual ad blocking is an attractive complement to today's dominant approach of block lists

CYFeb 18, 2019
Keeping out the Masses: Understanding the Popularity and Implications of Internet Paywalls

Panagiotis Papadopoulos, Peter Snyder, Dimitrios Athanasakis et al.

Funding the production of quality online content is a pressing problem for content producers. The most common funding method, online advertising, is rife with well-known performance and privacy harms, and an intractable subject-agent conflict: many users do not want to see advertisements, depriving the site of needed funding. Because of these negative aspects of advertisement-based funding, paywalls are an increasingly popular alternative for websites. This shift to a "pay-for-access" web is one that has potentially huge implications for the web and society. Instead of a system where information (nominally) flows freely, paywalls create a web where high quality information is available to fewer and fewer people, leaving the rest of the web users with less information, that might be also less accurate and of lower quality. Despite the potential significance of a move from an "advertising-but-open" web to a "paywalled" web, we find this issue understudied. This work addresses this gap in our understanding by measuring how widely paywalls have been adopted, what kinds of sites use paywalls, and the distribution of policies enforced by paywalls. A partial list of our findings include that (i) paywall use is accelerating (2x more paywalls every 6 months), (ii) paywall adoption differs by country (e.g. 18.75% in US, 12.69% in Australia), (iii) paywalls change how users interact with sites (e.g. higher bounce rates, less incoming links), (iv) the median cost of an annual paywall access is $108 per site, and (v) paywalls are in general trivial to circumvent. Finally, we present the design of a novel, automated system for detecting whether a site uses a paywall, through the combination of runtime browser instrumentation and repeated programmatic interactions with the site. We intend this classifier to augment future, longitudinal measurements of paywall use and behavior.

CRFeb 18, 2019
Smart Contract Vulnerabilities: Vulnerable Does Not Imply Exploited

Daniel Perez, Benjamin Livshits

In recent years, we have seen a great deal of both academic and practical interest in the topic of vulnerabilities in smart contracts, particularly those developed for the Ethereum blockchain. While most of the work has focused on detecting *vulnerable* contracts, in this paper, we focus on finding how many of these vulnerable contracts have actually been *exploited*. We survey the 23,327 vulnerable contracts reported by six recent academic projects and find that, despite the amounts at stake, only 1.98% of them have been exploited since deployment. This corresponds to at most 8,487 ETH (~1.7 million USD), or only 0.27% of the 3 million ETH (600 million USD) at stake. We explain these results by demonstrating that the funds are very concentrated in a small number of contracts which are *not exploitable* in practice.

TRNov 25, 2018
The Anatomy of a Cryptocurrency Pump-and-Dump Scheme

Jiahua Xu, Benjamin Livshits

While pump-and-dump schemes have attracted the attention of cryptocurrency observers and regulators alike, this paper represents the first detailed empirical query of pump-and-dump activities in cryptocurrency markets. We present a case study of a recent pump-and-dump event, investigate 412 pump-and-dump activities organized in Telegram channels from June 17, 2018 to February 26, 2019, and discover patterns in crypto-markets associated with pump-and-dump schemes. We then build a model that predicts the pump likelihood of all coins listed in a crypto-exchange prior to a pump. The model exhibits high precision as well as robustness, and can be used to create a simple, yet very effective trading strategy, which we empirically demonstrate can generate a return as high as 60% on small retail investments within a span of two and half months. The study provides a proof of concept for strategic crypto-trading and sheds light on the application of machine learning for crime detection.

HCNov 20, 2018
Evaluating the End-User Experience of Private Browsing Mode

Ruba Abu-Salma, Benjamin Livshits

Nowadays, all major web browsers have a private browsing mode. However, the mode's benefits and limitations are not particularly understood. Through the use of survey studies, prior work has found that most users are either unaware of private browsing or do not use it. Further, those who do use private browsing generally have misconceptions about what protection it provides. However, prior work has not investigated \emph{why} users misunderstand the benefits and limitations of private browsing. In this work, we do so by designing and conducting a three-part study: (1) an analytical approach combining cognitive walkthrough and heuristic evaluation to inspect the user interface of private mode in different browsers; (2) a qualitative, interview-based study to explore users' mental models of private browsing and its security goals; (3) a participatory design study to investigate why existing browser disclosures, the in-browser explanations of private browsing mode, do not communicate the security goals of private browsing to users. Participants critiqued the browser disclosures of three web browsers: Brave, Firefox, and Google Chrome, and then designed new ones. We find that the user interface of private mode in different web browsers violates several well-established design guidelines and heuristics. Further, most participants had incorrect mental models of private browsing, influencing their understanding and usage of private mode. Additionally, we find that existing browser disclosures are not only vague, but also misleading. None of the three studied browser disclosures communicates or explains the primary security goal of private browsing. Drawing from the results of our user study, we extract a set of design recommendations that we encourage browser designers to validate, in order to design more effective and informative browser disclosures related to private mode.

IRNov 8, 2018
SpeedReader: Reader Mode Made Fast and Private

Mohammad Ghasemisharif, Peter Snyder, Andrius Aucinas et al.

Most popular web browsers include "reader modes" that improve the user experience by removing un-useful page elements. Reader modes reformat the page to hide elements that are not related to the page's main content. Such page elements include site navigation, advertising related videos and images, and most JavaScript. The intended end result is that users can enjoy the content they are interested in, without distraction. In this work, we consider whether the "reader mode" can be widened to also provide performance and privacy improvements. Instead of its use as a post-render feature to clean up the clutter on a page we propose SpeedReader as an alternative multistep pipeline that is part of the rendering pipeline. Once the tool decides during the initial phase of a page load that a page is suitable for reader mode use, it directly applies document tree translation before the page is rendered. Based on our measurements, we believe that SpeedReader can be continuously enabled in order to drastically improve end-user experience, especially on slower mobile connections. Combined with our approach to predicting which pages should be rendered in reader mode with 91% accuracy, it achieves drastic speedups and bandwidth reductions of up to 27x and 84x respectively on average. We further find that our novel "reader mode" approach brings with it significant privacy improvements to users. Our approach effectively removes all commonly recognized trackers, issuing 115 fewer requests to third parties, and interacts with 64 fewer trackers on average, on transformed pages.

CROct 22, 2018
Who Filters the Filters: Understanding the Growth, Usefulness and Efficiency of Crowdsourced Ad Blocking

Peter Snyder, Antoine Vastel, Benjamin Livshits

Ad and tracking blocking extensions are popular tools for improving web performance, privacy and aesthetics. Content blocking extensions typically rely on filter lists to decide whether a web request is associated with tracking or advertising, and so should be blocked. Millions of web users rely on filter lists to protect their privacy and improve their browsing experience. Despite their importance, the growth and health of filter lists are poorly understood. Filter lists are maintained by a small number of contributors, who use a variety of undocumented heuristics to determine what rules should be included. Lists quickly accumulate rules, and rules are rarely removed. As a result, users' browsing experiences are degraded as the number of stale, dead or otherwise not useful rules increasingly dwarf the number of useful rules, with no attenuating benefit. An accumulation of "dead weight" rules also makes it difficult to apply filter lists on resource-limited mobile devices. This paper improves the understanding of crowdsourced filter lists by studying EasyList, the most popular filter list. We find that EasyList has grown from several hundred rules, to well over 60,000 rules, during its 9-year history. We measure how EasyList affects web browsing by applying EasyList to a sample of 10,000 websites. We find that 90.16% of the resource blocking rules in EasyList provide no benefit to users in common browsing scenarios. We further use our changes in EasyList application rates to provide a taxonomy of the ways advertisers evade EasyList rules. Finally, we propose optimizations for popular ad-blocking tools, that allow EasyList to be applied on performance constrained mobile devices, and improve desktop performance by 62.5%, while preserving over 99% of blocking coverage.

CYMay 22, 2018
AdGraph: A Graph-Based Approach to Ad and Tracker Blocking

Umar Iqbal, Peter Snyder, Shitong Zhu et al.

User demand for blocking advertising and tracking online is large and growing. Existing tools, both deployed and described in research, have proven useful, but lack either the completeness or robustness needed for a general solution. Existing detection approaches generally focus on only one aspect of advertising or tracking (e.g. URL patterns, code structure), making existing approaches susceptible to evasion. In this work we present AdGraph, a novel graph-based machine learning approach for detecting advertising and tracking resources on the web. AdGraph differs from existing approaches by building a graph representation of the HTML structure, network requests, and JavaScript behavior of a webpage, and using this unique representation to train a classifier for identifying advertising and tracking resources. Because AdGraph considers many aspects of the context a network request takes place in, it is less susceptible to the single-factor evasion techniques that flummox existing approaches. We evaluate AdGraph on the Alexa top-10K websites, and find that it is highly accurate, able to replicate the labels of human-generated filter lists with 95.33% accuracy, and can even identify many mistakes in filter lists. We implement AdGraph as a modification to Chromium. AdGraph adds only minor overhead to page loading and execution, and is actually faster than stock Chromium on 42% of websites and AdBlock Plus on 78% of websites. Overall, we conclude that AdGraph is both accurate enough and performant enough for online use, breaking comparable or fewer websites than popular filter list based approaches.

CRApr 18, 2018
When the signal is in the noise: Exploiting Diffix's Sticky Noise

Andrea Gadotti, Florimond Houssiau, Luc Rocher et al.

Anonymized data is highly valuable to both businesses and researchers. A large body of research has however shown the strong limits of the de-identification release-and-forget model, where data is anonymized and shared. This has led to the development of privacy-preserving query-based systems. Based on the idea of "sticky noise", Diffix has been recently proposed as a novel query-based mechanism satisfying alone the EU Article~29 Working Party's definition of anonymization. According to its authors, Diffix adds less noise to answers than solutions based on differential privacy while allowing for an unlimited number of queries. This paper presents a new class of noise-exploitation attacks, exploiting the noise added by the system to infer private information about individuals in the dataset. Our first differential attack uses samples extracted from Diffix in a likelihood ratio test to discriminate between two probability distributions. We show that using this attack against a synthetic best-case dataset allows us to infer private information with 89.4% accuracy using only 5 attributes. Our second cloning attack uses dummy conditions that conditionally strongly affect the output of the query depending on the value of the private attribute. Using this attack on four real-world datasets, we show that we can infer private attributes of at least 93% of the users in the dataset with accuracy between 93.3% and 97.1%, issuing a median of 304 queries per user. We show how to optimize this attack, targeting 55.4% of the users and achieving 91.7% accuracy, using a maximum of only 32 queries per user. Our attacks demonstrate that adding data-dependent noise, as done by Diffix, is not sufficient to prevent inference of private attributes. We furthermore argue that Diffix alone fails to satisfy Art. 29 WP's definition of anonymization. [...]

CRFeb 24, 2018
Security: Doing Whatever is Needed... and Not a Thing More!

Omer Katz, Benjamin Livshits

As malware, exploits, and cyber-attacks advance over time, so do the mitigation techniques available to the user. However, while attackers often abandon one form of exploitation in favor of a more lucrative one, mitigation techniques are rarely abandoned. Mitigations are rarely retired or disabled since proving they have outlived their usefulness is often impossible. As a result, performance overheads, maintenance costs, and false positive rates induced by the different mitigations accumulate, culminating in an outdated, inefficient, and costly security solution. We advocate for a new kind of tunable framework on which to base security mechanisms. This new framework enables a more reactive approach to security allowing us to optimize the deployment of security mechanisms based on the current state of attacks. Based on actual evidence of exploitation collected from the field, our framework can choose which mechanisms to enable/disable so that we can minimize the overall costs and false positive rates while maintaining a satisfactory level of security in the system. We use real-world Snort signatures to simulate the benefits of reactively disabling signatures when no evidence of exploitation is observed and compare them to the costs of the current state of deployment. Additionally, we evaluate the responsiveness of our framework and show that in case disabling a security mechanism triggers a reappearance of an attack we can respond in time to prevent mass exploitation. Through large-scale simulations that use integer linear and Bayesian solvers, we discover that our responsive strategy is both computationally affordable and results in significant reductions in false positives (~20% over traces that are about 9 years long), at the cost of introducing a moderate number of false negatives. Finding the optimal sampling strategy takes less than 2.5 minutes in the vast majority of cases.

CRMay 2, 2017
BLENDER: Enabling Local Search with a Hybrid Differential Privacy Model

Brendan Avent, Aleksandra Korolova, David Zeber et al.

We propose a hybrid model of differential privacy that considers a combination of regular and opt-in users who desire the differential privacy guarantees of the local privacy model and the trusted curator model, respectively. We demonstrate that within this model, it is possible to design a new type of blended algorithm for the task of privately computing the head of a search log. This blended approach provides significant improvements in the utility of obtained data compared to related work while providing users with their desired privacy guarantees. Specifically, on two large search click data sets, comprising 1.75 and 16 GB respectively, our approach attains NDCG values exceeding 95% across a range of privacy budget values.