Alexander Golovnev

CR
9papers
95citations
Novelty49%
AI Score47

9 Papers

CCApr 2
Linear Space Streaming Lower Bounds for Approximating CSPs

Chi-Ning Chou, Alexander Golovnev, Madhu Sudan et al.

We consider the approximability of constraint satisfaction problems in the streaming setting. For every constraint satisfaction problem (CSP) on $n$ variables taking values in $\{0,\ldots,q-1\}$, we prove that improving over the trivial approximability by a factor of $q$ requires $Ω(n)$ space even on instances with $O(n)$ constraints. We also identify a broad subclass of problems for which any improvement over the trivial approximability requires $Ω(n)$ space. The key technical core is an optimal, $q^{-(k-1)}$-inapproximability for the Max $k$-LIN-$\bmod\; q$ problem, which is the Max CSP problem where every constraint is given by a system of $k-1$ linear equations $\bmod\; q$ over $k$ variables. Our work builds on and extends the breakthrough work of Kapralov and Krachun (Proc. STOC 2019) who showed a linear lower bound on any non-trivial approximation of the MaxCut problem in graphs. MaxCut corresponds roughly to the case of Max $k$-LIN-$\bmod\; q$ with ${k=q=2}$. For general CSPs in the streaming setting, prior results only yielded $Ω(\sqrt{n})$ space bounds. In particular no linear space lower bound was known for an approximation factor less than $1/2$ for any CSP. Extending the work of Kapralov and Krachun to Max $k$-LIN-$\bmod\; q$ to $k>2$ and $q>2$ (while getting optimal hardness results) is the main technical contribution of this work. Each one of these extensions provides non-trivial technical challenges that we overcome in this work.

DSApr 23
Improved Time-Space Tradeoffs for 3SUM-Indexing

Itai Dinur, Alexander Golovnev

3SUM-Indexing is a preprocessing variant of the 3SUM problem that has recently received a lot of attention. The best known time-space tradeoff for the problem is $T S^3 = n^{6}$ (up to logarithmic factors), where $n$ is the number of input integers, $S$ is the length of the preprocessed data structure, and $T$ is the running time of the query algorithm. This tradeoff was achieved in [KP19, GGHPV20] using the Fiat-Naor generic algorithm for Function Inversion. Consequently, [GGHPV20] asked whether this algorithm can be improved by leveraging the structure of 3SUM-Indexing. In this paper, we exploit the structure of 3SUM-Indexing to give a time-space tradeoff of $T S = n^{2.5}$, which is better than the best known one in the range $n^{3/2} \ll S \ll n^{7/4}$. We further extend this improvement to the $k$SUM-Indexing problem-a generalization of 3SUM-Indexing-and to the related $k$XOR-Indexing problem, where addition is replaced with XOR. we improve the known time-space tradeoff for the Jumbled Indexing problem, which is a well-known data structure problem related to 3SUM-Indexing. Our improvement comes from an alternative way to apply the Fiat-Naor algorithm to 3SUM-Indexing. Specifically, we exploit the structure of the function to be inverted by decomposing it into "sub-functions" with certain properties. This allows us to apply an improvement to the Fiat-Naor algorithm (which is not directly applicable to 3SUM-Indexing), obtained in [GGPS23] in a much larger range of parameters. We believe that our techniques may be useful in additional application-dependent optimizations of the Fiat-Naor algorithm.

DSMay 6
Online Orthogonal Vectors Revisited

Karthik Gajulapalli, Alexander Golovnev, Samuel King et al.

We prove new upper and lower bounds for the Online Orthogonal Vectors Problem ($\mathsf{OnlineOV}_{n,d}$). In this problem, a preprocessing algorithm receives $n$ vectors $x_1,\ldots,x_n\in\{0,1\}^d$ and constructs a data structure of size $S$. A query algorithm subsequently receives a query vector $q\in\{0,1\}^d$ and in time $T$ decides whether $q$ is orthogonal to any of the input vectors $x_i$. We design a new deterministic data structure for $\mathsf{OnlineOV}_{n,d}$. In low dimensions ($d = c \log n$), our data structure matches the performance of the best known randomized algorithm due to Chan [SoCG 2017]. Furthermore, in moderate dimensions ($d=n^{\varepsilon}$), we give the first improvement since Charikar, Indyk and Panigrahy [ICALP 2002]. Along the way, we give the first deterministic refutation of a conjecture on the hardness of $\mathsf{OnlineOV}$ posed by Goldstein, Lewenstein and Porat [ISAAC 2017]. This data structure also extends to a number of problems, including Partial Match, Orthogonal Range Search, and DNF Evaluation. We use a novel structure-versus-randomness decomposition to design our algorithm. Under the Non-Uniform Strong Exponential Time Hypothesis, we also prove arbitrarily large polynomial space lower bounds for any $\mathsf{OnlineOV}$ data structure with sublinear query time even with computationally unbounded preprocessing. These lower bounds extend to several other problems, including Polynomial Evaluation, Partial Match, Orthogonal Range Search, and Approximate Nearest Neighbors. We also prove similar lower bounds for $\mathsf{3-SUM}$ with preprocessing under the Non-Uniform Hamiltonian Path Conjecture.

CRAug 24, 2019
An Attack on the the Encryption Scheme of the Moscow Internet Voting System

Alexander Golovnev

The next Moscow City Duma elections will be held on September 8th with an option of Internet voting. Some source code of the voting system is posted online for public testing. Pierrick Gaudry recently showed that due to the relatively small length of the key, the encryption scheme could be easily broken. This issue has been fixed in the current version of the voting system. In this note we show that the new implementation of the ElGamal encryption system is not semantically secure. We also demonstrate how this newly found security vulnerability can be potentially used for counting the number of votes cast for a candidate.

CRAug 14, 2019
Breaking the encryption scheme of the Moscow Internet voting system

Pierrick Gaudry, Alexander Golovnev

In September 2019, voters for the election at the Parliament of the city of Moscow were allowed to use an Internet voting system. The source code of it had been made available for public testing. In this paper we show two successful attacks on the encryption scheme implemented in the voting system. Both attacks were sent to the developers of the system, and both issues had been fixed after that.The encryption used in this system is a variant of ElGamal over finite fields. In the first attack we show that the used key sizes are too small. We explain how to retrieve the private keys from the public keys in a matter of minutes with easily available resources.When this issue had been fixed and the new system had become available for testing, we discovered that the new implementation was not semantically secure. We demonstrate how this newly found security vulnerability can be used for counting the number of votes cast for a candidate.

DSJul 19, 2019
Data Structures Meet Cryptography: 3SUM with Preprocessing

Alexander Golovnev, Siyao Guo, Thibaut Horel et al.

This paper shows several connections between data structure problems and cryptography against preprocessing attacks. Our results span data structure upper bounds, cryptographic applications, and data structure lower bounds, as summarized next. First, we apply Fiat--Naor inversion, a technique with cryptographic origins, to obtain a data structure upper bound. In particular, our technique yields a suite of algorithms with space $S$ and (online) time $T$ for a preprocessing version of the $N$-input 3SUM problem where $S^3\cdot T = \widetilde{O}(N^6)$. This disproves a strong conjecture (Goldstein et al., WADS 2017) that there is no data structure that solves this problem for $S=N^{2-δ}$ and $T = N^{1-δ}$ for any constant $δ>0$. Secondly, we show equivalence between lower bounds for a broad class of (static) data structure problems and one-way functions in the random oracle model that resist a very strong form of preprocessing attack. Concretely, given a random function $F: [N] \to [N]$ (accessed as an oracle) we show how to compile it into a function $G^F: [N^2] \to [N^2]$ which resists $S$-bit preprocessing attacks that run in query time $T$ where $ST=O(N^{2-\varepsilon})$ (assuming a corresponding data structure lower bound on 3SUM). In contrast, a classical result of Hellman tells us that $F$ itself can be more easily inverted, say with $N^{2/3}$-bit preprocessing in $N^{2/3}$ time. We also show that much stronger lower bounds follow from the hardness of kSUM. Our results can be equivalently interpreted as security against adversaries that are very non-uniform, or have large auxiliary input, or as security in the face of a powerfully backdoored random oracle. Thirdly, we give non-adaptive lower bounds for 3SUM and a range of geometric problems which match the best known lower bounds for static data structure problems.

LGJun 1, 2019
On the computational complexity of the probabilistic label tree algorithms

Robert Busa-Fekete, Krzysztof Dembczynski, Alexander Golovnev et al.

Label tree-based algorithms are widely used to tackle multi-class and multi-label problems with a large number of labels. We focus on a particular subclass of these algorithms that use probabilistic classifiers in the tree nodes. Examples of such algorithms are hierarchical softmax (HSM), designed for multi-class classification, and probabilistic label trees (PLTs) that generalize HSM to multi-label problems. If the tree structure is given, learning of PLT can be solved with provable regret guaranties [Wydmuch et.al. 2018]. However, to find a tree structure that results in a PLT with a low training and prediction computational costs as well as low statistical error seems to be a very challenging problem, not well-understood yet. In this paper, we address the problem of finding a tree structure that has low computational cost. First, we show that finding a tree with optimal training cost is NP-complete, nevertheless there are some tractable special cases with either perfect approximation or exact solution that can be obtained in linear time in terms of the number of labels $m$. For the general case, we obtain $O(\log m)$ approximation in linear time too. Moreover, we prove an upper bound on the expected prediction cost expressed in terms of the expected training cost. We also show that under additional assumptions the prediction cost of a PLT is $O(\log m)$.

LGJan 16, 2019
The information-theoretic value of unlabeled data in semi-supervised learning

Alexander Golovnev, Dávid Pál, Balázs Szörényi

We quantify the separation between the numbers of labeled examples required to learn in two settings: Settings with and without the knowledge of the distribution of the unlabeled data. More specifically, we prove a separation by $Θ(\log n)$ multiplicative factor for the class of projections over the Boolean hypercube of dimension $n$. We prove that there is no separation for the class of all functions on domain of any size. Learning with the knowledge of the distribution (a.k.a. fixed-distribution learning) can be viewed as an idealized scenario of semi-supervised learning where the number of unlabeled data points is so great that the unlabeled distribution is known exactly. For this reason, we call the separation the value of unlabeled data.

CRApr 28, 2015
Condensed Unpredictability

Maciej Skorski, Alexander Golovnev, Krzysztof Pietrzak

We consider the task of deriving a key with high HILL entropy from an unpredictable source. Previous to this work, the only known way to transform unpredictability into a key that was $\eps$ indistinguishable from having min-entropy was via pseudorandomness, for example by Goldreich-Levin (GL) hardcore bits. This approach has the inherent limitation that from a source with $k$ bits of unpredictability entropy one can derive a key of length (and thus HILL entropy) at most $k-2\log(1/ε)$ bits. In many settings, e.g. when dealing with biometric data, such a $2\log(1/ε)$ bit entropy loss in not an option. Our main technical contribution is a theorem that states that in the high entropy regime, unpredictability implies HILL entropy. The loss in circuit size in this argument is exponential in the entropy gap $d$. To overcome the above restriction, we investigate if it's possible to first "condense" unpredictability entropy and make the entropy gap small. We show that any source with $k$ bits of unpredictability can be condensed into a source of length $k$ with $k-3$ bits of unpredictability entropy. Our condenser simply "abuses" the GL construction and derives a $k$ bit key from a source with $k$ bits of unpredicatibily. The original GL theorem implies nothing when extracting that many bits, but we show that in this regime, GL still behaves like a "condenser" for unpredictability. This result comes with two caveats (1) the loss in circuit size is exponential in $k$ and (2) we require that the source we start with has \emph{no} HILL entropy (equivalently, one can efficiently check if a guess is correct). We leave it as an intriguing open problem to overcome these restrictions or to prove they're inherent.