James Bell

CR
10papers
555citations
Novelty42%
AI Score25

10 Papers

LGMay 18, 2022
No More Pesky Hyperparameters: Offline Hyperparameter Tuning for RL

Han Wang, Archit Sakhadeo, Adam White et al. · deepmind

The performance of reinforcement learning (RL) agents is sensitive to the choice of hyperparameters. In real-world settings like robotics or industrial control systems, however, testing different hyperparameter configurations directly on the environment can be financially prohibitive, dangerous, or time consuming. We propose a new approach to tune hyperparameters from offline logs of data, to fully specify the hyperparameters for an RL agent that learns online in the real world. The approach is conceptually simple: we first learn a model of the environment from the offline data, which we call a calibration model, and then simulate learning in the calibration model to identify promising hyperparameters. We identify several criteria to make this strategy effective, and develop an approach that satisfies these criteria. We empirically investigate the method in a variety of settings to identify when it is effective and when it fails.

CRJun 25, 2020Code
Differentially Private Health Tokens for Estimating COVID-19 Risk

David Butler, Chris Hicks, James Bell et al.

In the fight against Covid-19, many governments and businesses are in the process of evaluating, trialling and even implementing so-called immunity passports. Also known as antibody or health certificates, there is a clear demand for any technology that could allow people to return to work and other crowded places without placing others at risk. One of the major criticisms of such systems is that they could be misused to unfairly discriminate against those without immunity, allowing the formation of an `immuno-privileged' class of people. In this work we are motivated to explore an alternative technical solution that is non-discriminatory by design. In particular we propose health tokens -- randomised health certificates which, using methods from differential privacy, allow individual test results to be randomised whilst still allowing useful aggregate risk estimates to be calculated. We show that health tokens could mitigate immunity-based discrimination whilst still presenting a viable mechanism for estimating the collective transmission risk posed by small groups of users. We evaluate the viability of our approach in the context of identity-free and identity-binding use cases and then consider a number of possible attacks. Our experimental results show that for groups of size 500 or more, the error associated with our method can be as low as 0.03 on average and thus the aggregated results can be useful in a number of identity-free contexts. Finally, we present the results of our open-source prototype which demonstrates the practicality of our solution.

CRSep 15, 2021
MPC-Friendly Commitments for Publicly Verifiable Covert Security

Nitin Agrawal, James Bell, Adrià Gascón et al.

We address the problem of efficiently verifying a commitment in a two-party computation. This addresses the scenario where a party P1 commits to a value $x$ to be used in a subsequent secure computation with another party P2 that wants to receive assurance that P1 did not cheat, i.e. that $x$ was indeed the value inputted into the secure computation. Our constructions operate in the publicly verifiable covert (PVC) security model, which is a relaxation of the malicious model of MPC appropriate in settings where P1 faces a reputational harm if caught cheating. We introduce the notion of PVC commitment scheme and indexed hash functions to build commitments schemes tailored to the PVC framework, and propose constructions for both arithmetic and Boolean circuits that result in very efficient circuits. From a practical standpoint, our constructions for Boolean circuits are $60\times$ faster to evaluate securely, and use $36\times$ less communication than baseline methods based on hashing. Moreover, we show that our constructions are tight in terms of required non-linear operations, by proving lower bounds on the nonlinear gate count of commitment verification circuits. Finally, we present a technique to amplify the security properties our constructions that allows to efficiently recover malicious guarantees with statistical security.

LGMar 23, 2021
Integrating Novelty Detection Capabilities with MSL Mastcam Operations to Enhance Data Analysis

Paul Horton, Hannah R. Kerner, Samantha Jacob et al.

While innovations in scientific instrumentation have pushed the boundaries of Mars rover mission capabilities, the increase in data complexity has pressured Mars Science Laboratory (MSL) and future Mars rover operations staff to quickly analyze complex data sets to meet progressively shorter tactical and strategic planning timelines. MSLWEB is an internal data tracking tool used by operations staff to perform first pass analysis on MSL image sequences, a series of products taken by the Mast camera, Mastcam. Mastcam's multiband multispectral image sequences require more complex analysis compared to standard 3-band RGB images. Typically, these are analyzed using traditional methods to identify unique features within the sequence. Given the short time frame of tactical planning in which downlinked images might need to be analyzed (within 5-10 hours before the next uplink), there exists a need to triage analysis time to focus on the most important sequences and parts of a sequence. We address this need by creating products for MSLWEB that use novelty detection to help operations staff identify unusual data that might be diagnostic of new or atypical compositions or mineralogies detected within an imaging scene. This was achieved in two ways: 1) by creating products for each sequence to identify novel regions in the image, and 2) by assigning multispectral sequences a sortable novelty score. These new products provide colorized heat maps of inferred novelty that operations staff can use to rapidly review downlinked data and focus their efforts on analyzing potentially new kinds of diagnostic multispectral signatures. This approach has the potential to guide scientists to new discoveries by quickly drawing their attention to often subtle variations not detectable with simple color composites.

CRApr 8, 2020
TraceSecure: Towards Privacy Preserving Contact Tracing

James Bell, David Butler, Chris Hicks et al.

Contact tracing is being widely employed to combat the spread of COVID-19. Many apps have been developed that allow for tracing to be done automatically based off location and interaction data generated by users. There are concerns, however, regarding the privacy and security of users data when using these apps. These concerns are paramount for users who contract the virus, as they are generally required to release all their data. Motivated by the need to protect users privacy we propose two solutions to this problem. Our first solution builds on current "message based" methods and our second leverages ideas from secret sharing and additively homomorphic encryption.

CRFeb 3, 2020
Private Summation in the Multi-Message Shuffle Model

Borja Balle, James Bell, Adria Gascon et al.

The shuffle model of differential privacy (Erlingsson et al. SODA 2019; Cheu et al. EUROCRYPT 2019) and its close relative encode-shuffle-analyze (Bittau et al. SOSP 2017) provide a fertile middle ground between the well-known local and central models. Similarly to the local model, the shuffle model assumes an untrusted data collector who receives privatized messages from users, but in this case a secure shuffler is used to transmit messages from users to the collector in a way that hides which messages came from which user. An interesting feature of the shuffle model is that increasing the amount of messages sent by each user can lead to protocols with accuracies comparable to the ones achievable in the central model. In particular, for the problem of privately computing the sum of $n$ bounded real values held by $n$ different users, Cheu et al. showed that $O(\sqrt{n})$ messages per user suffice to achieve $O(1)$ error (the optimal rate in the central model), while Balle et al. (CRYPTO 2019) recently showed that a single message per user leads to $Θ(n^{1/3})$ MSE (mean squared error), a rate strictly in-between what is achievable in the local and central models. This paper introduces two new protocols for summation in the shuffle model with improved accuracy and communication trade-offs. Our first contribution is a recursive construction based on the protocol from Balle et al. mentioned above, providing $\mathrm{poly}(\log \log n)$ error with $O(\log \log n)$ messages per user. The second contribution is a protocol with $O(1)$ error and $O(1)$ messages per user based on a novel analysis of the reduction from secure summation to shuffling introduced by Ishai et al. (FOCS 2006) (the original reduction required $O(\log n)$ messages per user).

MLOct 9, 2019
Private Protocols for U-Statistics in the Local Model and Beyond

James Bell, Aurélien Bellet, Adrià Gascón et al.

In this paper, we study the problem of computing $U$-statistics of degree $2$, i.e., quantities that come in the form of averages over pairs of data points, in the local model of differential privacy (LDP). The class of $U$-statistics covers many statistical estimates of interest, including Gini mean difference, Kendall's tau coefficient and Area under the ROC Curve (AUC), as well as empirical risk measures for machine learning problems such as ranking, clustering and metric learning. We first introduce an LDP protocol based on quantizing the data into bins and applying randomized response, which guarantees an $ε$-LDP estimate with a Mean Squared Error (MSE) of $O(1/\sqrt{n}ε)$ under regularity assumptions on the $U$-statistic or the data distribution. We then propose a specialized protocol for AUC based on a novel use of hierarchical histograms that achieves MSE of $O(α^3/nε^2)$ for arbitrary data distribution. We also show that 2-party secure computation allows to design a protocol with MSE of $O(1/nε^2)$, without any assumption on the kernel function or data distribution and with total communication linear in the number of users $n$. Finally, we evaluate the performance of our protocols through experiments on synthetic and real datasets.

CRSep 24, 2019
Improved Summation from Shuffling

Borja Balle, James Bell, Adria Gascon et al.

A protocol by Ishai et al.\ (FOCS 2006) showing how to implement distributed $n$-party summation from secure shuffling has regained relevance in the context of the recently proposed \emph{shuffle model} of differential privacy, as it allows to attain the accuracy levels of the curator model at a moderate communication cost. To achieve statistical security $2^{-σ}$, the protocol by Ishai et al.\ requires the number of messages sent by each party to {\em grow} logarithmically with $n$ as $O(\log n + σ)$. In this note we give an improved analysis achieving a dependency of the form $O(1+σ/\log n)$. Conceptually, this addresses the intuitive question left open by Ishai et al.\ of whether the shuffling step in their protocol provides a "hiding in the crowd" amplification effect as $n$ increases. From a practical perspective, our analysis provides explicit constants and shows, for example, that the method of Ishai et al.\ applied to summation of $32$-bit numbers from $n=10^4$ parties sending $12$ messages each provides statistical security $2^{-40}$.

CRJun 20, 2019
Differentially Private Summation with Multi-Message Shuffling

Borja Balle, James Bell, Adria Gascon et al.

In recent work, Cheu et al. (Eurocrypt 2019) proposed a protocol for $n$-party real summation in the shuffle model of differential privacy with $O_{ε, δ}(1)$ error and $Θ(ε\sqrt{n})$ one-bit messages per party. In contrast, every local model protocol for real summation must incur error $Ω(1/\sqrt{n})$, and there exist protocols matching this lower bound which require just one bit of communication per party. Whether this gap in number of messages is necessary was left open by Cheu et al. In this note we show a protocol with $O(1/ε)$ error and $O(\log(n/δ))$ messages of size $O(\log(n))$ per party. This protocol is based on the work of Ishai et al.\ (FOCS 2006) showing how to implement distributed summation from secure shuffling, and the observation that this allows simulating the Laplace mechanism in the shuffle model.

LGMar 7, 2019
The Privacy Blanket of the Shuffle Model

Borja Balle, James Bell, Adria Gascon et al.

This work studies differential privacy in the context of the recently proposed shuffle model. Unlike in the local model, where the server collecting privatized data from users can track back an input to a specific user, in the shuffle model users submit their privatized inputs to a server anonymously. This setup yields a trust model which sits in between the classical curator and local models for differential privacy. The shuffle model is the core idea in the Encode, Shuffle, Analyze (ESA) model introduced by Bittau et al. (SOPS 2017). Recent work by Cheu et al. (EUROCRYPT 2019) analyzes the differential privacy properties of the shuffle model and shows that in some cases shuffled protocols provide strictly better accuracy than local protocols. Additionally, Erlingsson et al. (SODA 2019) provide a privacy amplification bound quantifying the level of curator differential privacy achieved by the shuffle model in terms of the local differential privacy of the randomizer used by each user. In this context, we make three contributions. First, we provide an optimal single message protocol for summation of real numbers in the shuffle model. Our protocol is very simple and has better accuracy and communication than the protocols for this same problem proposed by Cheu et al. Optimality of this protocol follows from our second contribution, a new lower bound for the accuracy of private protocols for summation of real numbers in the shuffle model. The third contribution is a new amplification bound for analyzing the privacy of protocols in the shuffle model in terms of the privacy provided by the corresponding local randomizer. Our amplification bound generalizes the results by Erlingsson et al. to a wider range of parameters, and provides a whole family of methods to analyze privacy amplification in the shuffle model.