Jeremiah Blocki

CR
32papers
667citations
Novelty59%
AI Score44

32 Papers

22.2CRMar 21
Conditional Encryption with Applications to Secure Personalized Password Typo Correction

Mohammad Hassan Ameri, Jeremiah Blocki

We introduce the notion of a conditional encryption scheme as an extension of public key encryption. In addition to the standard public key algorithms ($\mathsf{KG}$, $\mathsf{Enc}$, $\mathsf{Dec}$) for key generation, encryption and decryption, a conditional encryption scheme for a binary predicate $P$ adds a new conditional encryption algorithm $\mathsf{CEnc}$. The conditional encryption algorithm $c=\mathsf{CEnc}_{pk}(c_1,m_2,m_3)$ takes as input the public encryption key $pk$, a ciphertext $c_1 = \mathsf{Enc}_{pk}(m_1)$ for an unknown message $m_1$, a control message $m_2$ and a payload message $m_3$ and outputs a conditional ciphertext $c$. Intuitively, if $P(m_1,m_2)=1$ then the conditional ciphertext $c$ should decrypt to the payload message $m_3$. On the other hand if $P(m_1,m_2) = 0$ then the ciphertext should not leak any information about the control message $m_2$ or the payload message $m_3$ even if the attacker already has the secret decryption key $sk$. We formalize the notion of conditional encryption secrecy and provide concretely efficient constructions for a set of predicates relevant to password typo correction. Our practical constructions utilize the Paillier partially homomorphic encryption scheme as well as Shamir Secret Sharing. We prove that our constructions are secure and demonstrate how to use conditional encryption to improve the security of personalized password typo correction systems such as TypTop. We implement a C++ library for our practically efficient conditional encryption schemes and evaluate the performance empirically. We also update the implementation of TypTop to utilize conditional encryption for enhanced security guarantees and evaluate the performance of the updated implementation.

CRFeb 21, 2022
Using Illustrations to Communicate Differential Privacy Trust Models: An Investigation of Users' Comprehension, Perception, and Data Sharing Decision

Aiping Xiong, Chuhao Wu, Tianhao Wang et al.

Proper communication is key to the adoption and implementation of differential privacy (DP). However, a prior study found that laypeople did not understand the data perturbation processes of DP and how DP noise protects their sensitive personal information. Consequently, they distrusted the techniques and chose to opt out of participating. In this project, we designed explanative illustrations of three DP models (Central DP, Local DP, Shuffler DP) to help laypeople conceptualize how random noise is added to protect individuals' privacy and preserve group utility. Following pilot surveys and interview studies, we conducted two online experiments (N = 595) examining participants' comprehension, privacy and utility perception, and data-sharing decisions across the three DP models. Besides the comparisons across the three models, we varied the noise levels of each model. We found that the illustrations can be effective in communicating DP to the participants. Given an adequate comprehension of DP, participants preferred strong privacy protection for a certain type of data usage scenarios (i.e., commercial interests) at both the model level and the noise level. We also obtained empirical evidence showing participants' acceptance of the Shuffler DP model for data privacy protection. Our findings have implications for multiple stakeholders for user-centered deployments of differential privacy, including app developers, DP model developers, data curators, and online users.

DSFeb 11, 2022
Privately Estimating Graph Parameters in Sublinear time

Jeremiah Blocki, Elena Grigorescu, Tamalika Mukherjee

We initiate a systematic study of algorithms that are both differentially private and run in sublinear time for several problems in which the goal is to estimate natural graph parameters. Our main result is a differentially-private $(1+ρ)$-approximation algorithm for the problem of computing the average degree of a graph, for every $ρ>0$. The running time of the algorithm is roughly the same as its non-private version proposed by Goldreich and Ron (Sublinear Algorithms, 2005). We also obtain the first differentially-private sublinear-time approximation algorithms for the maximum matching size and the minimum vertex cover size of a graph. An overarching technique we employ is the notion of coupled global sensitivity of randomized algorithms. Related variants of this notion of sensitivity have been used in the literature in ad-hoc ways. Here we formalize the notion and develop it as a unifying framework for privacy analysis of randomized approximation algorithms.

LGDec 27, 2021
Differentially-Private Sublinear-Time Clustering

Jeremiah Blocki, Elena Grigorescu, Tamalika Mukherjee

Clustering is an essential primitive in unsupervised machine learning. We bring forth the problem of sublinear-time differentially-private clustering as a natural and well-motivated direction of research. We combine the $k$-means and $k$-median sublinear-time results of Mishra et al. (SODA, 2001) and of Czumaj and Sohler (Rand. Struct. and Algorithms, 2007) with recent results on private clustering of Balcan et al. (ICML 2017), Gupta et al. (SODA, 2010) and Ghazi et al. (NeurIPS, 2020) to obtain sublinear-time private $k$-means and $k$-median algorithms via subsampling. We also investigate the privacy benefits of subsampling for group privacy.

QUANT-PHOct 8, 2021
The Parallel Reversible Pebbling Game: Analyzing the Post-Quantum Security of iMHFs

Jeremiah Blocki, Blake Holman, Seunghoon Lee

The classical (parallel) black pebbling game is a useful abstraction which allows us to analyze the resources (space, space-time, cumulative space) necessary to evaluate a function $f$ with a static data-dependency graph $G$. Of particular interest in the field of cryptography are data-independent memory-hard functions $f_{G,H}$ which are defined by a directed acyclic graph (DAG) $G$ and a cryptographic hash function $H$. The pebbling complexity of the graph $G$ characterizes the amortized cost of evaluating $f_{G,H}$ multiple times as well as the total cost to run a brute-force preimage attack over a fixed domain $\mathcal{X}$, i.e., given $y \in \{0,1\}^*$ find $x \in \mathcal{X}$ such that $f_{G,H}(x)=y$. While a classical attacker will need to evaluate the function $f_{G,H}$ at least $m=|\mathcal{X}|$ times a quantum attacker running Grover's algorithm only requires $\mathcal{O}(\sqrt{m})$ blackbox calls to a quantum circuit $C_{G,H}$ evaluating the function $f_{G,H}$. Thus, to analyze the cost of a quantum attack it is crucial to understand the space-time cost (equivalently width times depth) of the quantum circuit $C_{G,H}$. We first observe that a legal black pebbling strategy for the graph $G$ does not necessarily imply the existence of a quantum circuit with comparable complexity -- in contrast to the classical setting where any efficient pebbling strategy for $G$ corresponds to an algorithm with comparable complexity evaluating $f_{G,H}$. Motivated by this observation we introduce a new parallel reversible pebbling game which captures additional restrictions imposed by the No-Deletion Theorem in Quantum Computing. We apply our new reversible pebbling game to analyze the reversible space-time complexity of several important graphs: Line Graphs, Argon2i-A, Argon2i-B, and DRSample. (See the paper for the full abstract.)

DSOct 8, 2021
On Explicit Constructions of Extremely Depth Robust Graphs

Jeremiah Blocki, Mike Cinkoske, Seunghoon Lee et al.

A directed acyclic graph $G=(V,E)$ is said to be $(e,d)$-depth robust if for every subset $S \subseteq V$ of $|S| \leq e$ nodes the graph $G-S$ still contains a directed path of length $d$. If the graph is $(e,d)$-depth-robust for any $e,d$ such that $e+d \leq (1-ε)|V|$ then the graph is said to be $ε$-extreme depth-robust. In the field of cryptography, (extremely) depth-robust graphs with low indegree have found numerous applications including the design of side-channel resistant Memory-Hard Functions, Proofs of Space and Replication, and in the design of Computationally Relaxed Locally Correctable Codes. In these applications, it is desirable to ensure the graphs are locally navigable, i.e., there is an efficient algorithm $\mathsf{GetParents}$ running in time $\mathrm{polylog} |V|$ which takes as input a node $v \in V$ and returns the set of $v$'s parents. We give the first explicit construction of locally navigable $ε$-extreme depth-robust graphs with indegree $O(\log |V|)$. Previous constructions of $ε$-extreme depth-robust graphs either had indegree $\tildeω(\log^2 |V|)$ or were not explicit.

CRMay 29, 2021
Towards a Rigorous Statistical Analysis of Empirical Password Datasets

Jeremiah Blocki, Peiyuan Liu

A central challenge in password security is to characterize the attacker's guessing curve i.e., what is the probability that the attacker will crack a random user's password within the first $G$ guesses. A key challenge is that the guessing curve depends on the attacker's guessing strategy and the distribution of user passwords both of which are unknown to us. In this work we aim to follow Kerckhoffs' principle and analyze the performance of an optimal attacker who knows the password distribution. Let $λ_G$ denote the probability that such an attacker can crack a random user's password within $G$ guesses. We develop several statistically rigorous techniques to upper and lower bound $λ_G$ given $N$ independent samples from the unknown distribution. We show that our bounds hold with high confidence and apply our techniques to analyze eight password datasets. Our empirical analysis shows that even state-of-the-art password cracking models are often significantly less guess efficient than an attacker who can optimize its attack based on its (partial) knowledge of the password distribution. We also apply our techniques to re-examine the empirical password distribution and Zipf's Law. We find that the empirical distribution closely matches our bounds on $λ_G$ when $G$ is not too large i.e., $G \ll N$. However, for larger values of $G$ our empirical analysis rigorously demonstrates that the empirical distribution (resp. Zipf's Law) overestimates the attacker's success rate. We apply our techniques to upper/lower bound the effectiveness of password throttling mechanisms (key-stretching) which are used to reduce the number of attacker guesses $G$. Finally, if we make an additional assumption about the way users respond to password restrictions, we can use our techniques to evaluate the effectiveness of password composition policies which restrict the passwords users may select.

ITMar 25, 2021
Private and Resource-Bounded Locally Decodable Codes for Insertions and Deletions

Alexander R. Block, Jeremiah Blocki

We construct locally decodable codes (LDCs) to correct insertion-deletion errors in the setting where the sender and receiver share a secret key or where the channel is resource-bounded. Our constructions rely on a so-called "Hamming-to-InsDel" compiler (Ostrovsky and Paskin-Cherniavsky, ITS '15 & Block et al., FSTTCS '20), which compiles any locally decodable Hamming code into a locally decodable code resilient to insertion-deletion (InsDel) errors. While the compilers were designed for the classical coding setting, we show that the compilers still work in a secret key or resource-bounded setting. Applying our results to the private key Hamming LDC of Ostrovsky, Pandey, and Sahai (ICALP '07), we obtain a private key InsDel LDC with constant rate and polylogarithmic locality. Applying our results to the construction of Blocki, Kulkarni, and Zhou (ITC '20), we obtain similar results for resource-bounded channels; i.e., a channel where computation is constrained by resources such as space or time.

CRJan 25, 2021
DAHash: Distribution Aware Tuning of Password Hashing Costs

Wenjie Bai, Jeremiah Blocki

An attacker who breaks into an authentication server and steals all of the cryptographic password hashes is able to mount an offline-brute force attack against each user's password. Offline brute-force attacks against passwords are increasingly commonplace and the danger is amplified by the well documented human tendency to select low-entropy password and/or reuse these passwords across multiple accounts. Moderately hard password hashing functions are often deployed to help protect passwords against offline attacks by increasing the attacker's guessing cost. However, there is a limit to how "hard" one can make the password hash function as authentication servers are resource constrained and must avoid introducing substantial authentication delay. Observing that there is a wide gap in the strength of passwords selected by different users we introduce DAHash (Distribution Aware Password Hashing) a novel mechanism which reduces the number of passwords that an attacker will crack. Our key insight ishat a resource-constrained authentication server can dynamically tune the hardness parameters of a password hash function based on the (estimated) strength of the user's password. We introduce a Stackelberg game to model the interaction between a defender (authentication server) and an offline attacker. Our model allows the defender to optimize the parameters of DAHash e.g., specify how much effort is spent to hash weak/moderate/high strength passwords. We use several large scale password frequency datasets to empirically evaluate the effectiveness of our differentiated cost password hashing mechanism. We find that the defender who uses our mechanism can reduce the fraction of passwords that would be cracked by a rational offline attacker by around 15%.

CRSep 21, 2020
Password Strength Signaling: A Counter-Intuitive Defense Against Password Cracking

Wenjie Bai, Jeremiah Blocki, Ben Harsha

We introduce password strength information signaling as a novel, yet counter-intuitive, defense mechanism against password cracking attacks. Recent breaches have exposed billions of user passwords to the dangerous threat of offline password cracking attacks. An offline attacker can quickly check millions (or sometimes billions/trillions) of password guesses by comparing their hash value with the stolen hash from a breached authentication server. The attacker is limited only by the resources he is willing to invest. Our key idea is to have the authentication server store a (noisy) signal about the strength of each user password for an offline attacker to find. Surprisingly, we show that the noise distribution for the signal can often be tuned so that a rational (profit-maximizing) attacker will crack fewer passwords. The signaling scheme exploits the fact that password cracking is not a zero-sum game i.e., the attacker's profit is given by the value of the cracked passwords minus the total guessing cost. Thus, a well-defined signaling strategy will encourage the attacker to reduce his guessing costs by cracking fewer passwords. We use an evolutionary algorithm to compute the optimal signaling scheme for the defender. As a proof-of-concept, we evaluate our mechanism on several password datasets and show that it can reduce the total number of cracked passwords by up to $12\%$ (resp. $5\%$) of all users in defending against offline (resp. online) attacks.

CRJun 19, 2020
On the Security of Proofs of Sequential Work in a Post-Quantum World

Jeremiah Blocki, Seunghoon Lee, Samson Zhou

A Proof of Sequential Work (PoSW) allows a prover to convince a resource-bounded verifier that the prover invested a substantial amount of sequential time to perform some underlying computation. PoSWs have many applications including time-stamping, blockchain design, and universally verifiable CPU benchmarks. Mahmoody, Moran, and Vadhan (ITCS 2013) gave the first construction of a PoSW in the random oracle model though the construction relied on expensive depth-robust graphs. In a recent breakthrough, Cohen and Pietrzak (EUROCRYPT 2018) gave an efficient PoSW construction that does not require expensive depth-robust graphs. In the classical parallel random oracle model, it is straightforward to argue that any successful PoSW attacker must produce a long $\mathcal{H}$-sequence and that any malicious party running in sequential time $T-1$ will fail to produce an $\mathcal{H}$-sequence of length $T$ except with negligible probability. In this paper, we prove that any quantum attacker running in sequential time $T-1$ will fail to produce an $\mathcal{H}$-sequence except with negligible probability -- even if the attacker submits a large batch of quantum queries in each round. The proof is substantially more challenging and highlights the power of Zhandry's recent compressed oracle technique (CRYPTO 2019). We further extend this result to establish post-quantum security of a non-interactive PoSW obtained by applying the Fiat-Shamir transform to Cohen and Pietrzak's efficient construction (EUROCRYPT 2018).

CRJun 9, 2020
On the Economics of Offline Password Cracking

Jeremiah Blocki, Ben Harsha, Samson Zhou

We develop an economic model of an offline password cracker which allows us to make quantitative predictions about the fraction of accounts that a rational password attacker would crack in the event of an authentication server breach. We apply our economic model to analyze recent massive password breaches at Yahoo!, Dropbox, LastPass and AshleyMadison. All four organizations were using key-stretching to protect user passwords. In fact, LastPass' use of PBKDF2-SHA256 with $10^5$ hash iterations exceeds 2017 NIST minimum recommendation by an order of magnitude. Nevertheless, our analysis paints a bleak picture: the adopted key-stretching levels provide insufficient protection for user passwords. In particular, we present strong evidence that most user passwords follow a Zipf's law distribution, and characterize the behavior of a rational attacker when user passwords are selected from a Zipf's law distribution. We show that there is a finite threshold which depends on the Zipf's law parameters that characterizes the behavior of a rational attacker -- if the value of a cracked password (normalized by the cost of computing the password hash function) exceeds this threshold then the adversary's optimal strategy is always to continue attacking until each user password has been cracked. In all cases (Yahoo!, Dropbox, LastPass and AshleyMadison) we find that the value of a cracked password almost certainly exceeds this threshold meaning that a rational attacker would crack all passwords that are selected from the Zipf's law distribution (i.e., most user passwords). This prediction holds even if we incorporate an aggressive model of diminishing returns for the attacker (e.g., the total value of $500$ million cracked passwords is less than $100$ times the total value of $5$ million passwords). See paper for full abstract.

CRMay 18, 2020
DALock: Distribution Aware Password Throttling

Jeremiah Blocki, Wuwei Zhang

Large-scale online password guessing attacks are wide-spread and continuously qualified as one of the top cyber-security risks. The common method for mitigating the risk of online cracking is to lock out the user after a fixed number ($K$) of consecutive incorrect login attempts. Selecting the value of $K$ induces a classic security-usability trade-off. When $K$ is too large a hacker can (quickly) break into a significant fraction of user accounts, but when $K$ is too low we will start to annoy honest users by locking them out after a few mistakes. Motivated by the observation that honest user mistakes typically look quite different than the password guesses of an online attacker, we introduce DALock a {\em distribution aware} password lockout mechanism to reduce user annoyance while minimizing user risk. As the name suggests, DALock is designed to be aware of the frequency and popularity of the password used for login attacks while standard throttling mechanisms (e.g., $K$-strikes) are oblivious to the password distribution. In particular, DALock maintains an extra "hit count" in addition to "strike count" for each user which is based on (estimates of) the cumulative probability of {\em all} login attempts for that particular account. We empirically evaluate DALock with an extensive battery of simulations using real world password datasets. In comparison with the traditional $K$-strikes mechanism we find that DALock offers a superior security/usability trade-off. For example, in one of our simulations we are able to reduce the success rate of an attacker to $0.05\%$ (compared to $1\%$ for the $10$-strikes mechanism) whilst simultaneously reducing the unwanted lockout rate for accounts that are not under attack to just $0.08\%$ (compared to $4\%$ for the $3$-strikes mechanism).

CRMay 12, 2020
An Economic Model for Quantum Key-Recovery Attacks against Ideal Ciphers

Benjamin Harsha, Jeremiah Blocki

It has been established that quantum algorithms can solve several key cryptographic problems more efficiently than classical computers. As progress continues in the field of quantum computing it is important to understand the risks they pose to deployed cryptographic systems. Here we focus on one of these risks - quantum key-recovery attacks against ideal ciphers. Specifically, we seek to model the risk posed by an economically motivated quantum attacker who will choose to run a quantum key-recovery attack against an ideal cipher if the cost to recover the secret key is less than the value of the information at the time when the key-recovery attack is complete. In our analysis we introduce the concept of a quantum cipher circuit year to measure the cost of a quantum attack. This concept can be used to model the inherent tradeoff between the total time to run a quantum key recovery attack and the total work required to run said attack. Our model incorporates the time value of the encrypted information to predict whether any time/work tradeoff results in a key-recovery attack with positive utility for the attacker. We make these predictions under various projections of advances in quantum computing. We use these predictions to make recommendations for the future use and deployment of symmetric key ciphers to secure information against these quantum key-recovery attacks. We argue that, even with optimistic predictions for advances in quantum computing, 128 bit keys (as used in common cipher implementations like AES-128) provide adequate security against quantum attacks in almost all use cases.

CRFeb 4, 2020
Bicycle Attacks Considered Harmful: Quantifying the Damage of Widespread Password Length Leakage

Benjamin Harsha, Robert Morton, Jeremiah Blocki et al.

We examine the issue of password length leakage via encrypted traffic i.e., bicycle attacks. We aim to quantify both the prevalence of password length leakage bugs as well as the potential harm to users. In an observational study, we find that {\em most} of the Alexa top 100 rates sites are vulnerable to bicycle attacks meaning that an eavesdropping attacker can infer the exact length of a password based on the length the encrypted packet containing the password. We discuss several ways in which an eavesdropping attacker could link this password length with a particular user account e.g., a targeted campaign against a smaller group of users or via DNS hijacking for larger scale campaigns. We next use a decision-theoretic model to quantify the extent to which password length leakage might help an attacker to crack user passwords. In our analysis, we consider three different levels of password attackers: hacker, criminal and nation-state. In all cases, we find that such an attacker who knows the length of each user password gains a significant advantage over one without knowing the password length. As part of this analysis, we also release a new differentially private password frequency dataset from the 2016 LinkedIn breach using a differentially private algorithm of Blocki et al. (NDSS 2016) to protect user accounts. The LinkedIn frequency corpus is based on over 170 million passwords making it the largest frequency corpus publicly available to password researchers. While the defense against bicycle attacks is straightforward (i.e., ensure that passwords are always padded before encryption), we discuss several practical challenges organizations may face when attempting to patch this vulnerability. We advocate for a new W3C standard on how password fields are handled which would effectively eliminate most instances of password length leakage.

CRNov 15, 2019
Computationally Data-Independent Memory Hard Functions

Mohammad Hassan Ameri, Jeremiah Blocki, Samson Zhou

Memory hard functions (MHFs) are an important cryptographic primitive that are used to design egalitarian proofs of work and in the construction of moderately expensive key-derivation functions resistant to brute-force attacks. Broadly speaking, MHFs can be divided into two categories: data-dependent memory hard functions (dMHFs) and data-independent memory hard functions (iMHFs). iMHFs are resistant to certain side-channel attacks as the memory access pattern induced by the honest evaluation algorithm is independent of the potentially sensitive input e.g., password. While dMHFs are potentially vulnerable to side-channel attacks (the induced memory access pattern might leak useful information to a brute-force attacker), they can achieve higher cumulative memory complexity (CMC) in comparison than an iMHF. In this paper, we introduce the notion of computationally data-independent memory hard functions (ciMHFs). Intuitively, we require that memory access pattern induced by the (randomized) ciMHF evaluation algorithm appears to be independent from the standpoint of a computationally bounded eavesdropping attacker --- even if the attacker selects the initial input. We then ask whether it is possible to circumvent known upper bound for iMHFs and build a ciMHF with CMC $Ω(N^2)$. Surprisingly, we answer the question in the affirmative when the ciMHF evaluation algorithm is executed on a two-tiered memory architecture (RAM/Cache). See paper for the full abstract.

DSOct 20, 2019
A New Connection Between Node and Edge Depth Robust Graphs

Jeremiah Blocki, Mike Cinkoske

Given a directed acyclic graph (DAG) $G = (V,E)$, we say that $G$ is $(e,d)$-depth-robust (resp. $(e,d)$-edge-depth-robust) if for any set $S \subset V$ (resp. $S \subseteq E$) of at most $|S| \leq e$ nodes (resp. edges) the graph $G-S$ contains a directed path of length $d$. While edge-depth-robust graphs are potentially easier to construct many applications in cryptography require node depth-robust graphs with small indegree. We create a graph reduction that transforms an $(e, d)$-edge-depth-robust graph with $m$ edges into a $(e/2,d)$-depth-robust graph with $O(m)$ nodes and constant indegree. One immediate consequence of this result is the first construction of a provably $(\frac{n \log \log n}{\log n}, \frac{n}{(\log n)^{1 + \log \log n}})$-depth-robust graph with constant indegree, where previous constructions for $e =\frac{n \log \log n}{\log n}$ had $d = O(n^{1-ε})$. Our reduction crucially relies on ST-Robust graphs, a new graph property we introduce which may be of independent interest. We say that a directed, acyclic graph with $n$ inputs and $n$ outputs is $(k_1, k_2)$-ST-Robust if we can remove any $k_1$ nodes and there exists a subgraph containing at least $k_2$ inputs and $k_2$ outputs such that each of the $k_2$ inputs is connected to all of the $k_2$ outputs. If the graph if $(k_1,n-k_1)$-ST-Robust for all $k_1 \leq n$ we say that the graph is maximally ST-robust. We show how to construct maximally ST-robust graphs with constant indegree and $O(n)$ nodes. Given a family $ \mathbb{M}$ of ST-robust graphs and an arbitrary $(e, d)$-edge-depth-robust graph $G$ we construct a new constant-indegree graph $ \mathrm{Reduce}(G, \mathbb{M})$ by replacing each node in $G$ with an ST-robust graph from $ \mathbb{M}$. We also show that ST-robust graphs can be used to construct (tight) proofs-of-space and (asymptotically) improved wide-block labeling functions.

CRSep 25, 2019
On Locally Decodable Codes in Resource Bounded Channels

Jeremiah Blocki, Shubhang Kulkarni, Samson Zhou

Constructions of locally decodable codes (LDCs) have one of two undesirable properties: low rate or high locality (polynomial in the length of the message). In settings where the encoder/decoder have already exchanged cryptographic keys and the channel is a probabilistic polynomial time (PPT) algorithm, it is possible to circumvent these barriers and design LDCs with constant rate and small locality. However, the assumption that the encoder/decoder have exchanged cryptographic keys is often prohibitive. We thus consider the problem of designing explicit and efficient LDCs in settings where the channel is slightly more constrained than the encoder/decoder with respect to some resource e.g., space or (sequential) time. Given an explicit function $f$ that the channel cannot compute, we show how the encoder can transmit a random secret key to the local decoder using $f(\cdot)$ and a random oracle $H(\cdot)$. This allows bootstrap from the private key LDC construction of Ostrovsky, Pandey and Sahai (ICALP, 2007), thereby answering an open question posed by Guruswami and Smith (FOCS 2010) of whether such bootstrapping techniques may apply to LDCs in weaker channel models than just PPT algorithms. Specifically, in the random oracle model we show how to construct explicit constant rate LDCs with locality of polylog in the security parameter against various resource constrained channels.

CCApr 17, 2019
Approximating Cumulative Pebbling Cost is Unique Games Hard

Jeremiah Blocki, Seunghoon Lee, Samson Zhou

The cumulative pebbling complexity of a directed acyclic graph $G$ is defined as $\mathsf{cc}(G) = \min_P \sum_i |P_i|$, where the minimum is taken over all legal (parallel) black pebblings of $G$ and $|P_i|$ denotes the number of pebbles on the graph during round $i$. Intuitively, $\mathsf{cc}(G)$ captures the amortized Space-Time complexity of pebbling $m$ copies of $G$ in parallel. The cumulative pebbling complexity of a graph $G$ is of particular interest in the field of cryptography as $\mathsf{cc}(G)$ is tightly related to the amortized Area-Time complexity of the Data-Independent Memory-Hard Function (iMHF) $f_{G,H}$ [AS15] defined using a constant indegree directed acyclic graph (DAG) $G$ and a random oracle $H(\cdot)$. A secure iMHF should have amortized Space-Time complexity as high as possible, e.g., to deter brute-force password attacker who wants to find $x$ such that $f_{G,H}(x) = h$. Thus, to analyze the (in)security of a candidate iMHF $f_{G,H}$, it is crucial to estimate the value $\mathsf{cc}(G)$ but currently, upper and lower bounds for leading iMHF candidates differ by several orders of magnitude. Blocki and Zhou recently showed that it is $\mathsf{NP}$-Hard to compute $\mathsf{cc}(G)$, but their techniques do not even rule out an efficient $(1+\varepsilon)$-approximation algorithm for any constant $\varepsilon>0$. We show that for any constant $c > 0$, it is Unique Games hard to approximate $\mathsf{cc}(G)$ to within a factor of $c$. (See the paper for the full abstract.)

CRMay 15, 2017
Sustained Space Complexity

Joel Alwen, Jeremiah Blocki, Krzysztof Pietrzak

Memory-hard functions (MHF) are functions whose evaluation cost is dominated by memory cost. MHFs are egalitarian, in the sense that evaluating them on dedicated hardware (like FPGAs or ASICs) is not much cheaper than on off-the-shelf hardware (like x86 CPUs). MHFs have interesting cryptographic applications, most notably to password hashing and securing blockchains. Alwen and Serbinenko [STOC'15] define the cumulative memory complexity (cmc) of a function as the sum (over all time-steps) of the amount of memory required to compute the function. They advocate that a good MHF must have high cmc. Unlike previous notions, cmc takes into account that dedicated hardware might exploit amortization and parallelism. Still, cmc has been critizised as insufficient, as it fails to capture possible time-memory trade-offs, as memory cost doesn't scale linearly, functions with the same cmc could still have very different actual hardware cost. In this work we address this problem, and introduce the notion of sustained-memory complexity, which requires that any algorithm evaluating the function must use a large amount of memory for many steps. We construct functions (in the parallel random oracle model) whose sustained-memory complexity is almost optimal: our function can be evaluated using $n$ steps and $O(n/\log(n))$ memory, in each step making one query to the (fixed-input length) random oracle, while any algorithm that can make arbitrary many parallel queries to the random oracle, still needs $Ω(n/\log(n))$ memory for $Ω(n)$ steps. Our main technical contribution is the construction is a family of DAGs on $n$ nodes with constant indegree with high "sustained-space complexity", meaning that any parallel black-pebbling strategy requires $Ω(n/\log(n))$ pebbles for at least $Ω(n)$ steps.

CRMay 12, 2017
Optimizing Locally Differentially Private Protocols

Tianhao Wang, Jeremiah Blocki, Ninghui Li et al.

Protocols satisfying Local Differential Privacy (LDP) enable parties to collect aggregate information about a population while protecting each user's privacy, without relying on a trusted third party. LDP protocols (such as Google's RAPPOR) have been deployed in real-world scenarios. In these protocols, a user encodes his private information and perturbs the encoded value locally before sending it to an aggregator, who combines values that users contribute to infer statistics about the population. In this paper, we introduce a framework that generalizes several LDP protocols proposed in the literature. Our framework yields a simple and fast aggregation algorithm, whose accuracy can be precisely analyzed. Our in-depth analysis enables us to choose optimal parameters, resulting in two new protocols (i.e., Optimized Unary Encoding and Optimized Local Hashing) that provide better utility than protocols previously proposed. We present precise conditions for when each proposed protocol should be used, and perform experiments that demonstrate the advantage of our proposed protocols.

CRSep 14, 2016
On the Computational Complexity of Minimal Cumulative Cost Graph Pebbling

Jeremiah Blocki, Samson Zhou

We consider the computational complexity of finding a legal black pebbling of a DAG $G=(V,E)$ with minimum cumulative cost. A black pebbling is a sequence $P_0,\ldots, P_t \subseteq V$ of sets of nodes which must satisfy the following properties: $P_0 = \emptyset$ (we start off with no pebbles on $G$), $\mathsf{sinks}(G) \subseteq \bigcup_{j \leq t} P_j$ (every sink node was pebbled at some point) and $\mathsf{parents}\big(P_{i+1}\backslash P_i\big) \subseteq P_i$ (we can only place a new pebble on a node $v$ if all of $v$'s parents had a pebble during the last round). The cumulative cost of a pebbling $P_0,P_1,\ldots, P_t \subseteq V$ is $\mathsf{cc}(P) = | P_1| + \ldots + | P_t|$. The cumulative pebbling cost is an especially important security metric for data-independent memory hard functions, an important primitive for password hashing. Thus, an efficient (approximation) algorithm would be an invaluable tool for the cryptanalysis of password hash functions as it would provide an automated tool to establish tight bounds on the amortized space-time cost of computing the function. We show that such a tool is unlikely to exist. In particular, we prove the following results. (1) It is $\texttt{NP}\mbox{-}\texttt{Hard}$ to find a pebbling minimizing cumulative cost. (2) The natural linear program relaxation for the problem has integrality gap $\tilde{O}(n)$, where $n$ is the number of nodes in $G$. We conjecture that the problem is hard to approximate. (3) We show that a related problem, find the minimum size subset $S\subseteq V$ such that $\textsf{depth}(G-S) \leq d$, is also $\texttt{NP}\mbox{-}\texttt{Hard}$. In fact, under the unique games conjecture there is no $(2-ε)$-approximation algorithm.

CRMar 2, 2016
Client-CASH: Protecting Master Passwords against Offline Attacks

Jeremiah Blocki, Anirudh Sridhar

Offline attacks on passwords are increasingly commonplace and dangerous. An offline adversary is limited only by the amount of computational resources he or she is willing to invest to crack a user's password. The danger is compounded by the existence of authentication servers who fail to adopt proper password storage practices like key-stretching. Password managers can help mitigate these risks by adopting key stretching procedures like hash iteration or memory hard functions to derive site specific passwords from the user's master password on the client-side. While key stretching can reduce the offline adversary's success rate, these procedures also increase computational costs for a legitimate user. Motivated by the observation that most of the password guesses of the offline adversary will be incorrect, we propose a client side cost asymmetric secure hashing scheme (Client-CASH). Client-CASH randomizes the runtime of client-side key stretching procedure in a way that the expected computational cost of our key derivation function is greater when run with an incorrect master password. We make several contributions. First, we show how to introduce randomness into a client-side key stretching algorithms through the use of halting predicates which are selected randomly at the time of account creation. Second, we formalize the problem of finding the optimal running time distribution subject to certain cost constraints for the client and certain security constrains on the halting predicates. Finally, we demonstrate that Client-CASH can reduce the adversary's success rate by up to $21\%$. These results demonstrate the promise of the Client-CASH mechanism.

CRSep 1, 2015
CASH: A Cost Asymmetric Secure Hash Algorithm for Optimal Password Protection

Jeremiah Blocki, Anupam Datta

An adversary who has obtained the cryptographic hash of a user's password can mount an offline attack to crack the password by comparing this hash value with the cryptographic hashes of likely password guesses. This offline attacker is limited only by the resources he is willing to invest to crack the password. Key-stretching tools can help mitigate the threat of offline attacks by making each password guess more expensive for the adversary to verify. However, key-stretching increases authentication costs for a legitimate authentication server. We introduce a novel Stackelberg game model which captures the essential elements of this interaction between a defender and an offline attacker. We then introduce Cost Asymmetric Secure Hash (CASH), a randomized key-stretching mechanism that minimizes the fraction of passwords that would be cracked by a rational offline attacker without increasing amortized authentication costs for the legitimate authentication server. CASH is motivated by the observation that the legitimate authentication server will typically run the authentication procedure to verify a correct password, while an offline adversary will typically use incorrect password guesses. By using randomization we can ensure that the amortized cost of running CASH to verify a correct password guess is significantly smaller than the cost of rejecting an incorrect password. Using our Stackelberg game framework we can quantify the quality of the underlying CASH running time distribution in terms of the fraction of passwords that a rational offline adversary would crack. We provide an efficient algorithm to compute high quality CASH distributions for the defender. Finally, we analyze CASH using empirical data from two large scale password frequency datasets. Our analysis shows that CASH can significantly reduce (up to $50\%$) the fraction of password cracked by a rational offline adversary.

CROct 6, 2014
Spaced Repetition and Mnemonics Enable Recall of Multiple Strong Passwords

Jeremiah Blocki, Saranga Komanduri, Lorrie Cranor et al.

We report on a user study that provides evidence that spaced repetition and a specific mnemonic technique enable users to successfully recall multiple strong passwords over time. Remote research participants were asked to memorize 4 Person-Action-Object (PAO) stories where they chose a famous person from a drop-down list and were given machine-generated random action-object pairs. Users were also shown a photo of a scene and asked to imagine the PAO story taking place in the scene (e.g., Bill Gates---swallowing---bike on a beach). Subsequently, they were asked to recall the action-object pairs when prompted with the associated scene-person pairs following a spaced repetition schedule over a period of 127+ days. While we evaluated several spaced repetition schedules, the best results were obtained when users initially returned after 12 hours and then in $1.5\times$ increasing intervals: 77% of the participants successfully recalled all 4 stories in 10 tests over a period of 158 days. Much of the forgetting happened in the first test period (12 hours): 89% of participants who remembered their stories during the first test period successfully remembered them in every subsequent round. These findings, coupled with recent results on naturally rehearsing password schemes, suggest that 4 PAO stories could be used to create usable and strong passwords for 14 sensitive accounts following this spaced repetition schedule, possibly with a few extra upfront rehearsals. In addition, we find that there is an interference effect across multiple PAO stories: the recall rate of 100% (resp. 90%) for participants who were asked to memorize 1 PAO story (resp. 2 PAO stories) is significantly better than the recall rate for participants who were asked to memorize 4 PAO stories. These findings yield concrete advice for improving constructions of password management schemes and future user studies.

GTSep 16, 2014
Audit Games with Multiple Defender Resources

Jeremiah Blocki, Nicolas Christin, Anupam Datta et al.

Modern organizations (e.g., hospitals, social networks, government agencies) rely heavily on audit to detect and punish insiders who inappropriately access and disclose confidential information. Recent work on audit games models the strategic interaction between an auditor with a single audit resource and auditees as a Stackelberg game, augmenting associated well-studied security games with a configurable punishment parameter. We significantly generalize this audit game model to account for multiple audit resources where each resource is restricted to audit a subset of all potential violations, thus enabling application to practical auditing scenarios. We provide an FPTAS that computes an approximately optimal solution to the resulting non-convex optimization problem. The main technical novelty is in the design and correctness proof of an optimization transformation that enables the construction of this FPTAS. In addition, we experimentally demonstrate that this transformation significantly speeds up computation of solutions for a class of audit games and security games.

CRMar 31, 2014
Towards Human Computable Passwords

Jeremiah Blocki, Manuel Blum, Anupam Datta et al.

An interesting challenge for the cryptography community is to design authentication protocols that are so simple that a human can execute them without relying on a fully trusted computer. We propose several candidate authentication protocols for a setting in which the human user can only receive assistance from a semi-trusted computer --- a computer that stores information and performs computations correctly but does not provide confidentiality. Our schemes use a semi-trusted computer to store and display public challenges $C_i\in[n]^k$. The human user memorizes a random secret mapping $σ:[n]\rightarrow\mathbb{Z}_d$ and authenticates by computing responses $f(σ(C_i))$ to a sequence of public challenges where $f:\mathbb{Z}_d^k\rightarrow\mathbb{Z}_d$ is a function that is easy for the human to evaluate. We prove that any statistical adversary needs to sample $m=\tildeΩ(n^{s(f)})$ challenge-response pairs to recover $σ$, for a security parameter $s(f)$ that depends on two key properties of $f$. To obtain our results, we apply the general hypercontractivity theorem to lower bound the statistical dimension of the distribution over challenge-response pairs induced by $f$ and $σ$. Our lower bounds apply to arbitrary functions $f $ (not just to functions that are easy for a human to evaluate), and generalize recent results of Feldman et al. As an application, we propose a family of human computable password functions $f_{k_1,k_2}$ in which the user needs to perform $2k_1+2k_2+1$ primitive operations (e.g., adding two digits or remembering $σ(i)$), and we show that $s(f) = \min\{k_1+1, (k_2+1)/2\}$. For these schemes, we prove that forging passwords is equivalent to recovering the secret mapping. Thus, our human computable password schemes can maintain strong security guarantees even after an adversary has observed the user login to many different accounts.

CROct 4, 2013
GOTCHA Password Hackers!

Jeremiah Blocki, Manuel Blum, Anupam Datta

We introduce GOTCHAs (Generating panOptic Turing Tests to Tell Computers and Humans Apart) as a way of preventing automated offline dictionary attacks against user selected passwords. A GOTCHA is a randomized puzzle generation protocol, which involves interaction between a computer and a human. Informally, a GOTCHA should satisfy two key properties: (1) The puzzles are easy for the human to solve. (2) The puzzles are hard for a computer to solve even if it has the random bits used by the computer to generate the final puzzle --- unlike a CAPTCHA. Our main theorem demonstrates that GOTCHAs can be used to mitigate the threat of offline dictionary attacks against passwords by ensuring that a password cracker must receive constant feedback from a human being while mounting an attack. Finally, we provide a candidate construction of GOTCHAs based on Inkblot images. Our construction relies on the usability assumption that users can recognize the phrases that they originally used to describe each Inkblot image --- a much weaker usability assumption than previous password systems based on Inkblots which required users to recall their phrase exactly. We conduct a user study to evaluate the usability of our GOTCHA construction. We also generate a GOTCHA challenge where we encourage artificial intelligence and security researchers to try to crack several passwords protected with our scheme.

GTMar 2, 2013
Audit Games

Jeremiah Blocki, Nicolas Christin, Anupam Datta et al.

Effective enforcement of laws and policies requires expending resources to prevent and detect offenders, as well as appropriate punishment schemes to deter violators. In particular, enforcement of privacy laws and policies in modern organizations that hold large volumes of personal information (e.g., hospitals, banks, and Web services providers) relies heavily on internal audit mechanisms. We study economic considerations in the design of these mechanisms, focusing in particular on effective resource allocation and appropriate punishment schemes. We present an audit game model that is a natural generalization of a standard security game model for resource allocation with an additional punishment parameter. Computing the Stackelberg equilibrium for this game is challenging because it involves solving an optimization problem with non-convex quadratic constraints. We present an additive FPTAS that efficiently computes a solution that is arbitrarily close to the optimal solution.

CRFeb 20, 2013
Naturally Rehearsing Passwords

Jeremiah Blocki, Manuel Blum, Anupam Datta

We introduce quantitative usability and security models to guide the design of password management schemes --- systematic strategies to help users create and remember multiple passwords. In the same way that security proofs in cryptography are based on complexity-theoretic assumptions (e.g., hardness of factoring and discrete logarithm), we quantify usability by introducing usability assumptions. In particular, password management relies on assumptions about human memory, e.g., that a user who follows a particular rehearsal schedule will successfully maintain the corresponding memory. These assumptions are informed by research in cognitive science and validated through empirical studies. Given rehearsal requirements and a user's visitation schedule for each account, we use the total number of extra rehearsals that the user would have to do to remember all of his passwords as a measure of the usability of the password scheme. Our usability model leads us to a key observation: password reuse benefits users not only by reducing the number of passwords that the user has to memorize, but more importantly by increasing the natural rehearsal rate for each password. We also present a security model which accounts for the complexity of password management with multiple accounts and associated threats, including online, offline, and plaintext password leak attacks. Observing that current password management schemes are either insecure or unusable, we present Shared Cues--- a new scheme in which the underlying secret is strategically shared across accounts to ensure that most rehearsal requirements are satisfied naturally while simultaneously providing strong security. The construction uses the Chinese Remainder Theorem to achieve these competing goals.

CRFeb 20, 2013
Optimizing Password Composition Policies

Jeremiah Blocki, Saranga Komanduri, Ariel Procaccia et al.

A password composition policy restricts the space of allowable passwords to eliminate weak passwords that are vulnerable to statistical guessing attacks. Usability studies have demonstrated that existing password composition policies can sometimes result in weaker password distributions; hence a more principled approach is needed. We introduce the first theoretical model for optimizing password composition policies. We study the computational and sample complexity of this problem under different assumptions on the structure of policies and on users' preferences over passwords. Our main positive result is an algorithm that -- with high probability --- constructs almost optimal policies (which are specified as a union of subsets of allowed passwords), and requires only a small number of samples of users' preferred passwords. We complement our theoretical results with simulations using a real-world dataset of 32 million passwords.

CRAug 22, 2012
Differentially Private Data Analysis of Social Networks via Restricted Sensitivity

Jeremiah Blocki, Avrim Blum, Anupam Datta et al.

We introduce the notion of restricted sensitivity as an alternative to global and smooth sensitivity to improve accuracy in differentially private data analysis. The definition of restricted sensitivity is similar to that of global sensitivity except that instead of quantifying over all possible datasets, we take advantage of any beliefs about the dataset that a querier may have, to quantify over a restricted class of datasets. Specifically, given a query f and a hypothesis H about the structure of a dataset D, we show generically how to transform f into a new query f_H whose global sensitivity (over all datasets including those that do not satisfy H) matches the restricted sensitivity of the query f. Moreover, if the belief of the querier is correct (i.e., D is in H) then f_H(D) = f(D). If the belief is incorrect, then f_H(D) may be inaccurate. We demonstrate the usefulness of this notion by considering the task of answering queries regarding social-networks, which we model as a combination of a graph and a labeling of its vertices. In particular, while our generic procedure is computationally inefficient, for the specific definition of H as graphs of bounded degree, we exhibit efficient ways of constructing f_H using different projection-based techniques. We then analyze two important query classes: subgraph counting queries (e.g., number of triangles) and local profile queries (e.g., number of people who know a spy and a computer-scientist who know each other). We demonstrate that the restricted sensitivity of such queries can be significantly lower than their smooth sensitivity. Thus, using restricted sensitivity we can maintain privacy whether or not D is in H, while providing more accurate results in the event that H holds true.