Muhammad Haris Mughees

h-index6

3papers

149citations

3 Papers

17.4CRJul 10Code

Wally: Batched Private Nearest Neighbor Search at Scale

Hilal Asi, Fabian Boemer, Nicholas Genise et al. · apple-ml

We present Wally, a batched private nearest-neighbor search protocol that uses differential privacy to break the linear computation barrier of fully-oblivious schemes. In Tiptoe, the server must process the entire database per query to hide the access pattern, resulting in low throughput (909 QPS) and high communication (17.4 MB) on a 3.2M-entry database. Sublinear alternatives like Pacmann require 614 MB of client storage and an offline streaming phase. Wally's key insight is that fully-oblivious schemes are prohibitively expensive at scale, but the same scale also provides an opportunity. Large-scale systems naturally have many concurrent clients. Wally batches queries from non-coordinating clients, each independently adding fake queries to hide which clusters it accesses. The fake query counts follow a negative binomial distribution, which is non-negative and infinitely divisible, allowing independent sampling without coordination. Clients send queries at random times through an existing anonymization service, avoiding a centralized shuffler. The server sees only an anonymized, noisy stream of cluster accesses that is provably (epsilon, delta)-differentially private, computing over only the relevant clusters. The client encrypts its query under SHE so the server returns only encrypted similarity scores. On a 3.2M-entry database with 500K-query batches, Wally achieves 7-29x higher throughput and 6.7-31x lower communication than Tiptoe, and 15,000x lower client storage than Pacmann, with strong (epsilon=0.1, delta=2^{-26})-DP and comparable accuracy. We also propose optimizations to SHE and keyword PIR yielding 2-3x improvements in PIR and 20-25% in BFV operations, and release an open-source BFV library in Swift.

3.8CRSep 16, 2021

PrivateFetch: Scalable Catalog Delivery in Privacy-Preserving Advertising

Muhammad Haris Mughees, Gonçalo Pestana, Alex Davidson et al.

In order to preserve the possibility of an Internet that is free at the point of use, attention is turning to new solutions that would allow targeted advertisement delivery based on behavioral information such as user preferences, without compromising user privacy. Recently, explorations in devising such systems either take approaches that rely on semantic guarantees like $k$-anonymity -- which can be easily subverted when combining with alternative information, and do not take into account the possibility that even knowledge of such clusters is privacy-invasive in themselves. Other approaches provide full privacy by moving all data and processing logic to clients -- but which is prohibitively expensive for both clients and servers. In this work, we devise a new framework called PrivateFetch for building practical ad-delivery pipelines that rely on cryptographic hardness and best-case privacy, rather than syntactic privacy guarantees or reliance on real-world anonymization tools. PrivateFetch utilizes local computation of preferences followed by high-performance single-server private information retrieval (PIR) to ensure that clients can pre-fetch ad content from servers, without revealing any of their inherent characteristics to the content provider. When considering an database of $>1,000,000$ ads, we show that we can deliver $30$ ads to a client in 40 seconds, with total communication costs of 192KB. We also demonstrate the feasibility of PrivateFetch by showing that the monetary cost of running it is less than 1% of average ad revenue. As such, our system is capable of pre-fetching ads for clients based on behavioral and contextual user information, before displaying them during a typical browsing session. In addition, while we test PrivateFetch as a private ad-delivery, the generality of our approach means that it could also be used for other content types.

3.1CRMay 19, 2016

A First Look at Ad-block Detection: A New Arms Race on the Web

Muhammad Haris Mughees, Zhiyun Qian, Zubair Shafiq et al.

The rise of ad-blockers is viewed as an economic threat by online publishers, especially those who primarily rely on ad- vertising to support their services. To address this threat, publishers have started retaliating by employing ad-block detectors, which scout for ad-blocker users and react to them by restricting their content access and pushing them to whitelist the website or disabling ad-blockers altogether. The clash between ad-blockers and ad-block detectors has resulted in a new arms race on the web. In this paper, we present the first systematic measurement and analysis of ad-block detection on the web. We have designed and implemented a machine learning based tech- nique to automatically detect ad-block detection, and use it to study the deployment of ad-block detectors on Alexa top- 100K websites. The approach is promising with precision of 94.8% and recall of 93.1%. We characterize the spectrum of different strategies used by websites for ad-block detection. We find that most of publishers use fairly simple passive ap- proaches for ad-block detection. However, we also note that a few websites use third-party services, e.g. PageFair, for ad-block detection and response. The third-party services use active deception and other sophisticated tactics to de- tect ad-blockers. We also find that the third-party services can successfully circumvent ad-blockers and display ads on publisher websites.