DSJul 10, 2019
Approximate Voronoi cells for lattices, revisitedThijs Laarhoven
We revisit the approximate Voronoi cells approach for solving the closest vector problem with preprocessing (CVPP) on high-dimensional lattices, and settle the open problem of Doulgerakis-Laarhoven-De Weger [PQCrypto, 2019] of determining exact asymptotics on the volume of these Voronoi cells under the Gaussian heuristic. As a result, we obtain improved upper bounds on the time complexity of the randomized iterative slicer when using less than $2^{0.076d + o(d)}$ memory, and we show how to obtain time-memory trade-offs even when using less than $2^{0.048d + o(d)}$ memory. We also settle the open problem of obtaining a continuous trade-off between the size of the advice and the query time complexity, as the time complexity with subexponential advice in our approach scales as $d^{d/2 + o(d)}$, matching worst-case enumeration bounds, and achieving the same asymptotic scaling as average-case enumeration algorithms for the closest vector problem.
DSJul 10, 2019
Evolutionary techniques in lattice sieving algorithmsThijs Laarhoven
Lattice-based cryptography has recently emerged as a prominent candidate for secure communication in the quantum age. Its security relies on the hardness of certain lattice problems, and the inability of known lattice algorithms, such as lattice sieving, to solve these problems efficiently. In this paper we investigate the similarities between lattice sieving and evolutionary algorithms, how various improvements to lattice sieving can be viewed as applications of known techniques from evolutionary computation, and how other evolutionary techniques can benefit lattice sieving in practice.
DSJul 10, 2019
Polytopes, lattices, and spherical codes for the nearest neighbor problemThijs Laarhoven
We study locality-sensitive hash methods for the nearest neighbor problem for the angular distance, focusing on the approach of first projecting down onto a low-dimensional subspace, and then partitioning the projected vectors according to Voronoi cells induced by a suitable spherical code. This approach generalizes and interpolates between the fast but suboptimal hyperplane hashing of Charikar [STOC'02] and the asymptotically optimal but practically often slower hash families of Andoni-Indyk [FOCS'06], Andoni-Indyk-Nguyen-Razenshteyn [SODA'14] and Andoni-Indyk-Laarhoven-Razenshteyn-Schmidt [NIPS'15]. We set up a framework for analyzing the performance of any spherical code in this context, and we provide results for various codes from the literature, such as those related to regular polytopes and root lattices. Similar to hyperplane hashing, and unlike cross-polytope hashing, our analysis of collision probabilities and query exponents is exact and does not hide order terms which vanish only for large $d$, facilitating an easy parameter selection. For the two-dimensional case, we derive closed-form expressions for arbitrary spherical codes, and we show that the equilateral triangle is optimal, achieving a better performance than the two-dimensional analogues of hyperplane and cross-polytope hashing. In three and four dimensions, we numerically find that the tetrahedron, $5$-cell, and $16$-cell achieve the best query exponents, while in five or more dimensions orthoplices appear to outperform regular simplices, as well as the root lattice families $A_k$ and $D_k$. We argue that in higher dimensions, larger spherical codes will likely exist which will outperform orthoplices in theory, and we argue why using the $D_k$ root lattices will likely lead to better results in practice, due to a better trade-off between the asymptotic query exponent and the concrete costs of hashing.
CRFeb 17, 2019
Nearest neighbor decoding for Tardos fingerprinting codesThijs Laarhoven
Over the past decade, various improvements have been made to Tardos' collusion-resistant fingerprinting scheme [Tardos, STOC 2003], ultimately resulting in a good understanding of what is the minimum code length required to achieve collusion-resistance. In contrast, decreasing the cost of the actual decoding algorithm for identifying the potential colluders has received less attention, even though previous results have shown that using joint decoding strategies, deemed too expensive for decoding, may lead to better code lengths. Moreover, in dynamic settings a fast decoder may be required to provide answers in real-time, further raising the question whether the decoding costs of score-based fingerprinting schemes can be decreased with a smarter decoding algorithm. In this paper we show how to model the decoding step of score-based fingerprinting as a nearest neighbor search problem, and how this relation allows us to apply techniques from the field of (approximate) nearest neighbor searching to obtain decoding times which are sublinear in the total number of users. As this does not affect the encoding and embedding steps, this decoding mechanism can easily be deployed within existing fingerprinting schemes, and this may bring a truly efficient joint decoder closer to reality. Besides the application to fingerprinting, similar techniques can be used to decrease the decoding costs of group testing methods, which may be of independent interest.
DSDec 8, 2017
Graph-based time-space trade-offs for approximate near neighborsThijs Laarhoven
We take a first step towards a rigorous asymptotic analysis of graph-based approaches for finding (approximate) nearest neighbors in high-dimensional spaces, by analyzing the complexity of (randomized) greedy walks on the approximate near neighbor graph. For random data sets of size $n = 2^{o(d)}$ on the $d$-dimensional Euclidean unit sphere, using near neighbor graphs we can provably solve the approximate nearest neighbor problem with approximation factor $c > 1$ in query time $n^{ρ_q + o(1)}$ and space $n^{1 + ρ_s + o(1)}$, for arbitrary $ρ_q, ρ_s \geq 0$ satisfying \begin{align} (2c^2 - 1) ρ_q + 2 c^2 (c^2 - 1) \sqrt{ρ_s (1 - ρ_s)} \geq c^4. \end{align} Graph-based near neighbor searching is especially competitive with hash-based methods for small $c$ and near-linear memory, and in this regime the asymptotic scaling of a greedy graph-based search matches the recent optimal hash-based trade-offs of Andoni-Laarhoven-Razenshteyn-Waingarten [SODA'17]. We further study how the trade-offs scale when the data set is of size $n = 2^{Θ(d)}$, and analyze asymptotic complexities when applying these results to lattice sieving.
DSMay 8, 2017
Faster tuple lattice sieving using spherical locality-sensitive filtersThijs Laarhoven
To overcome the large memory requirement of classical lattice sieving algorithms for solving hard lattice problems, Bai-Laarhoven-Stehlé [ANTS 2016] studied tuple lattice sieving, where tuples instead of pairs of lattice vectors are combined to form shorter vectors. Herold-Kirshanova [PKC 2017] recently improved upon their results for arbitrary tuple sizes, for example showing that a triple sieve can solve the shortest vector problem (SVP) in dimension $d$ in time $2^{0.3717d + o(d)}$, using a technique similar to locality-sensitive hashing for finding nearest neighbors. In this work, we generalize the spherical locality-sensitive filters of Becker-Ducas-Gama-Laarhoven [SODA 2016] to obtain space-time tradeoffs for near neighbor searching on dense data sets, and we apply these techniques to tuple lattice sieving to obtain even better time complexities. For instance, our triple sieve heuristically solves SVP in time $2^{0.3588d + o(d)}$. For practical sieves based on Micciancio-Voulgaris' GaussSieve [SODA 2010], this shows that a triple sieve uses less space and less time than the current best near-linear space double sieve.
DSFeb 19, 2017
Hypercube LSH for approximate near neighborsThijs Laarhoven
A celebrated technique for finding near neighbors for the angular distance involves using a set of \textit{random} hyperplanes to partition the space into hash regions [Charikar, STOC 2002]. Experiments later showed that using a set of \textit{orthogonal} hyperplanes, thereby partitioning the space into the Voronoi regions induced by a hypercube, leads to even better results [Terasawa and Tanaka, WADS 2007]. However, no theoretical explanation for this improvement was ever given, and it remained unclear how the resulting hypercube hash method scales in high dimensions. In this work, we provide explicit asymptotics for the collision probabilities when using hypercubes to partition the space. For instance, two near-orthogonal vectors are expected to collide with probability $(\frac{1}π)^{d + o(d)}$ in dimension $d$, compared to $(\frac{1}{2})^d$ when using random hyperplanes. Vectors at angle $\fracπ{3}$ collide with probability $(\frac{\sqrt{3}}π)^{d + o(d)}$, compared to $(\frac{2}{3})^d$ for random hyperplanes, and near-parallel vectors collide with similar asymptotic probabilities in both cases. For $c$-approximate nearest neighbor searching, this translates to a decrease in the exponent $ρ$ of locality-sensitive hashing (LSH) methods of a factor up to $\log_2(π) \approx 1.652$ compared to hyperplane LSH. For $c = 2$, we obtain $ρ\approx 0.302 + o(1)$ for hypercube LSH, improving upon the $ρ\approx 0.377$ for hyperplane LSH. We further describe how to use hypercube LSH in practice, and we consider an example application in the area of lattice algorithms.
DSAug 11, 2016
Optimal Hashing-based Time-Space Trade-offs for Approximate Near NeighborsAlexandr Andoni, Thijs Laarhoven, Ilya Razenshteyn et al.
[See the paper for the full abstract.] We show tight upper and lower bounds for time-space trade-offs for the $c$-Approximate Near Neighbor Search problem. For the $d$-dimensional Euclidean space and $n$-point datasets, we develop a data structure with space $n^{1 + ρ_u + o(1)} + O(dn)$ and query time $n^{ρ_q + o(1)} + d n^{o(1)}$ for every $ρ_u, ρ_q \geq 0$ such that: \begin{equation} c^2 \sqrt{ρ_q} + (c^2 - 1) \sqrt{ρ_u} = \sqrt{2c^2 - 1}. \end{equation} This is the first data structure that achieves sublinear query time and near-linear space for every approximation factor $c > 1$, improving upon [Kapralov, PODS 2015]. The data structure is a culmination of a long line of work on the problem for all space regimes; it builds on Spherical Locality-Sensitive Filtering [Becker, Ducas, Gama, Laarhoven, SODA 2016] and data-dependent hashing [Andoni, Indyk, Nguyen, Razenshteyn, SODA 2014] [Andoni, Razenshteyn, STOC 2015]. Our matching lower bounds are of two types: conditional and unconditional. First, we prove tightness of the whole above trade-off in a restricted model of computation, which captures all known hashing-based approaches. We then show unconditional cell-probe lower bounds for one and two probes that match the above trade-off for $ρ_q = 0$, improving upon the best known lower bounds from [Panigrahy, Talwar, Wieder, FOCS 2010]. In particular, this is the first space lower bound (for any static data structure) for two probes which is not polynomially smaller than the one-probe bound. To show the result for two probes, we establish and exploit a connection to locally-decodable codes.
CRJul 16, 2016
Sieving for closest lattice vectors (with preprocessing)Thijs Laarhoven
Lattice-based cryptography has recently emerged as a prime candidate for efficient and secure post-quantum cryptography. The two main hard problems underlying its security are the shortest vector problem (SVP) and the closest vector problem (CVP). Various algorithms have been studied for solving these problems, and for SVP, lattice sieving currently dominates in terms of the asymptotic time complexity: one can heuristically solve SVP in time $2^{0.292d}$ in high dimensions $d$ [BDGL'16]. Although several SVP algorithms can also be used to solve CVP, it is not clear whether this also holds for heuristic lattice sieving methods. The best time complexity for CVP is currently $2^{0.377d}$ [BGJ'14]. In this paper we revisit sieving algorithms for solving SVP, and study how these algorithms can be modified to solve CVP and its variants as well. Our first method is aimed at solving one problem instance and minimizes the overall time complexity for a single CVP instance with a time complexity of $2^{0.292d}$. Our second method minimizes the amortized time complexity for several instances on the same lattice, at the cost of a larger preprocessing cost. We can solve the closest vector problem with preprocessing (CVPP) with $2^{0.636d}$ space and preprocessing, in $2^{0.136d}$ time, while the query complexity can even be reduced to $2^{εd}$ at the cost of preprocessing time and memory complexities of $(1/ε)^{O(d)}$. For easier variants of CVP, such as approximate CVP and bounded distance decoding (BDD), we further show how the preprocessing method achieves even better complexities. For instance, we can solve approximate CVPP with large approximation factors $k$ with polynomial-sized advice in polynomial time if $k = Ω(\sqrt{d/\log d})$, heuristically closing the gap between the decision-CVPP result of [AR'04] and the search-CVPP result of [DRS'14].
DSNov 24, 2015
Tradeoffs for nearest neighbors on the sphereThijs Laarhoven
We consider tradeoffs between the query and update complexities for the (approximate) nearest neighbor problem on the sphere, extending the recent spherical filters to sparse regimes and generalizing the scheme and analysis to account for different tradeoffs. In a nutshell, for the sparse regime the tradeoff between the query complexity $n^{ρ_q}$ and update complexity $n^{ρ_u}$ for data sets of size $n$ is given by the following equation in terms of the approximation factor $c$ and the exponents $ρ_q$ and $ρ_u$: $$c^2\sqrt{ρ_q}+(c^2-1)\sqrt{ρ_u}=\sqrt{2c^2-1}.$$ For small $c=1+ε$, minimizing the time for updates leads to a linear space complexity at the cost of a query time complexity $n^{1-4ε^2}$. Balancing the query and update costs leads to optimal complexities $n^{1/(2c^2-1)}$, matching bounds from [Andoni-Razenshteyn, 2015] and [Dubiner, IEEE-TIT'10] and matching the asymptotic complexities of [Andoni-Razenshteyn, STOC'15] and [Andoni-Indyk-Laarhoven-Razenshteyn-Schmidt, NIPS'15]. A subpolynomial query time complexity $n^{o(1)}$ can be achieved at the cost of a space complexity of the order $n^{1/(4ε^2)}$, matching the bound $n^{Ω(1/ε^2)}$ of [Andoni-Indyk-Patrascu, FOCS'06] and [Panigrahy-Talwar-Wieder, FOCS'10] and improving upon results of [Indyk-Motwani, STOC'98] and [Kushilevitz-Ostrovsky-Rabani, STOC'98]. For large $c$, minimizing the update complexity results in a query complexity of $n^{2/c^2+O(1/c^4)}$, improving upon the related exponent for large $c$ of [Kapralov, PODS'15] by a factor $2$, and matching the bound $n^{Ω(1/c^2)}$ of [Panigrahy-Talwar-Wieder, FOCS'08]. Balancing the costs leads to optimal complexities $n^{1/(2c^2-1)}$, while a minimum query time complexity can be achieved with update complexity $n^{2/c^2+O(1/c^4)}$, improving upon the previous best exponents of Kapralov by a factor $2$.
DSSep 9, 2015
Practical and Optimal LSH for Angular DistanceAlexandr Andoni, Piotr Indyk, Thijs Laarhoven et al.
We show the existence of a Locality-Sensitive Hashing (LSH) family for the angular distance that yields an approximate Near Neighbor Search algorithm with the asymptotically optimal running time exponent. Unlike earlier algorithms with this property (e.g., Spherical LSH [Andoni, Indyk, Nguyen, Razenshteyn 2014], [Andoni, Razenshteyn 2015]), our algorithm is also practical, improving upon the well-studied hyperplane LSH [Charikar, 2002] in practice. We also introduce a multiprobe version of this algorithm, and conduct experimental evaluation on real and synthetic data sets. We complement the above positive results with a fine-grained lower bound for the quality of any LSH family for angular distance. Our lower bound implies that the above LSH family exhibits a trade-off between evaluation time and quality that is close to optimal for a natural class of LSH functions.
CRFeb 12, 2015
Optimal sequential fingerprinting: Wald vs. TardosThijs Laarhoven
We study sequential collusion-resistant fingerprinting, where the fingerprinting code is generated in advance but accusations may be made between rounds, and show that in this setting both the dynamic Tardos scheme and schemes building upon Wald's sequential probability ratio test (SPRT) are asymptotically optimal. We further compare these two approaches to sequential fingerprinting, highlighting differences between the two schemes. Based on these differences, we argue that Wald's scheme should in general be preferred over the dynamic Tardos scheme, even though both schemes have their merits. As a side result, we derive an optimal sequential group testing method for the classical model, which can easily be generalized to different group testing models.
ITApr 9, 2014
Asymptotics of Fingerprinting and Group Testing: Capacity-Achieving Log-Likelihood DecodersThijs Laarhoven
We study the large-coalition asymptotics of fingerprinting and group testing, and derive explicit decoders that provably achieve capacity for many of the considered models. We do this both for simple decoders (fast but suboptimal) and for joint decoders (slow but optimal), and both for informed and uninformed settings. For fingerprinting, we show that if the pirate strategy is known, the Neyman-Pearson-based log-likelihood decoders provably achieve capacity, regardless of the strategy. The decoder built against the interleaving attack is further shown to be a universal decoder, able to deal with arbitrary attacks and achieving the uninformed capacity. This universal decoder is shown to be closely related to the Lagrange-optimized decoder of Oosterwijk et al. and the empirical mutual information decoder of Moulin. Joint decoders are also proposed, and we conjecture that these also achieve the corresponding joint capacities. For group testing, the simple decoder for the classical model is shown to be more efficient than the one of Chan et al. and it provably achieves the simple group testing capacity. For generalizations of this model such as noisy group testing, the resulting simple decoders also achieve the corresponding simple capacities.
ITApr 9, 2014
Asymptotics of Fingerprinting and Group Testing: Tight Bounds from Channel CapacitiesThijs Laarhoven
In this work we consider the large-coalition asymptotics of various fingerprinting and group testing games, and derive explicit expressions for the capacities for each of these models. We do this both for simple decoders (fast but suboptimal) and for joint decoders (slow but optimal). For fingerprinting, we show that if the pirate strategy is known, the capacity often decreases linearly with the number of colluders, instead of quadratically as in the uninformed fingerprinting game. For many attacks the joint capacity is further shown to be strictly higher than the simple capacity. For group testing, we improve upon known results about the joint capacities, and derive new explicit asymptotics for the simple capacities. These show that existing simple group testing algorithms are suboptimal, and that simple decoders cannot asymptotically be as efficient as joint decoders. For the traditional group testing model, we show that the gap between the simple and joint capacities is a factor 1.44 for large numbers of defectives.
ITJan 22, 2014
Capacities and Capacity-Achieving Decoders for Various Fingerprinting GamesThijs Laarhoven
Combining an information-theoretic approach to fingerprinting with a more constructive, statistical approach, we derive new results on the fingerprinting capacities for various informed settings, as well as new log-likelihood decoders with provable code lengths that asymptotically match these capacities. The simple decoder built against the interleaving attack is further shown to achieve the simple capacity for unknown attacks, and is argued to be an improved version of the recently proposed decoder of Oosterwijk et al. With this new universal decoder, cut-offs on the bias distribution function can finally be dismissed. Besides the application of these results to fingerprinting, a direct consequence of our results to group testing is that (i) a simple decoder asymptotically requires a factor 1.44 more tests to find defectives than a joint decoder, and (ii) the simple decoder presented in this paper provably achieves this bound.
ITJul 9, 2013
Efficient Probabilistic Group Testing Based on Traitor TracingThijs Laarhoven
Inspired by recent results from collusion-resistant traitor tracing, we provide a framework for constructing efficient probabilistic group testing schemes. In the traditional group testing model, our scheme asymptotically requires T ~ 2 K ln N tests to find (with high probability) the correct set of K defectives out of N items. The framework is also applied to several noisy group testing and threshold group testing models, often leading to improvements over previously known results, but we emphasize that this framework can be applied to other variants of the classical model as well, both in adaptive and in non-adaptive settings.
CRJun 30, 2013
Dynamic Traitor Tracing Schemes, RevisitedThijs Laarhoven
We revisit recent results from the area of collusion-resistant traitor tracing, and show how they can be combined and improved to obtain more efficient dynamic traitor tracing schemes. In particular, we show how the dynamic Tardos scheme of Laarhoven et al. can be combined with the optimized score functions of Oosterwijk et al. to trace coalitions much faster. If the attack strategy is known, in many cases the order of the code length goes down from quadratic to linear in the number of colluders, while if the attack is not known, we show how the interleaving defense may be used to catch all colluders about twice as fast as in the dynamic Tardos scheme. Some of these results also apply to the static traitor tracing setting where the attack strategy is known in advance, and to group testing.
CRFeb 7, 2013
Discrete Distributions in the Tardos Scheme, RevisitedThijs Laarhoven, Benne de Weger
The Tardos scheme is a well-known traitor tracing scheme to protect copyrighted content against collusion attacks. The original scheme contained some suboptimal design choices, such as the score function and the distribution function used for generating the biases. Skoric et al. previously showed that a symbol-symmetric score function leads to shorter codes, while Nuida et al. obtained the optimal distribution functions for arbitrary coalition sizes. Later, Nuida et al. showed that combining these results leads to even shorter codes when the coalition size is small. We extend their analysis to the case of large coalitions and prove that these optimal distributions converge to the arcsine distribution, thus showing that the arcsine distribution is asymptotically optimal in the symmetric Tardos scheme. We also present a new, practical alternative to the discrete distributions of Nuida et al. and give a comparison of the estimated lengths of the fingerprinting codes for each of these distributions.
CRJan 25, 2013
Solving the Shortest Vector Problem in Lattices Faster Using Quantum SearchThijs Laarhoven, Michele Mosca, Joop van de Pol
By applying Grover's quantum search algorithm to the lattice algorithms of Micciancio and Voulgaris, Nguyen and Vidick, Wang et al., and Pujol and Stehlé, we obtain improved asymptotic quantum results for solving the shortest vector problem. With quantum computers we can provably find a shortest vector in time $2^{1.799n + o(n)}$, improving upon the classical time complexity of $2^{2.465n + o(n)}$ of Pujol and Stehlé and the $2^{2n + o(n)}$ of Micciancio and Voulgaris, while heuristically we expect to find a shortest vector in time $2^{0.312n + o(n)}$, improving upon the classical time complexity of $2^{0.384n + o(n)}$ of Wang et al. These quantum complexities will be an important guide for the selection of parameters for post-quantum cryptosystems based on the hardness of the shortest vector problem.
CRJun 28, 2012
Dynamic Traitor Tracing for Arbitrary Alphabets: Divide and ConquerThijs Laarhoven, Jan-Jaap Oosterwijk, Jeroen Doumen
We give a generic divide-and-conquer approach for constructing collusion-resistant probabilistic dynamic traitor tracing schemes with larger alphabets from schemes with smaller alphabets. This construction offers a linear tradeoff between the alphabet size and the codelength. In particular, we show that applying our results to the binary dynamic Tardos scheme of Laarhoven et al. leads to schemes that are shorter by a factor equal to half the alphabet size. Asymptotically, these codelengths correspond, up to a constant factor, to the fingerprinting capacity for static probabilistic schemes. This gives a hierarchy of probabilistic dynamic traitor tracing schemes, and bridges the gap between the low bandwidth, high codelength scheme of Laarhoven et al. and the high bandwidth, low codelength scheme of Fiat and Tassa.