88.4CCApr 23
Polynomial Lower Bounds for Arithmetic Circuits over Non-Commutative RingsRan Raz
We prove a lower bound of $Ω\left(n^{1.5}\right)$ for the number of product gates in non-commutative arithmetic circuits for an explicit $n$-variate degree-$n$ polynomial $f_{n}$ (over every field). We observe that this implies that over certain non-commutative rings $R$, any arithmetic circuit that computes the induced polynomial function $f_{n}: R^n \rightarrow R$, using the ring operations of addition and multiplication in $R$, requires at least $Ω\left(n^{1.5}\right)$ multiplications. More generally, for any $d\geq 2$ and sufficiently large $n$, we obtain a lower bound of $Ω\left(d\sqrt{n}\right)$ for $n$-variate degree-$d$ polynomials, for both these models. Prior to our work, the only known lower bounds for the size of non-commutative circuits, or for the size of arithmetic circuits over any ring, were slightly super-linear in $\max\{n,d\}$: $Ω\left(n\log d\right)$ by Baur and Strassen, and $Ω\left(d\log n\right)$ by Nisan. (Nisan's bound was proved for non-commutative arithmetic circuits and implies a bound for arithmetic circuits over non-commutative rings by our observation).
LGJul 5, 2021
Memory-Sample Lower Bounds for Learning Parity with NoiseSumegha Garg, Pravesh K. Kothari, Pengda Liu et al.
In this work, we show, for the well-studied problem of learning parity under noise, where a learner tries to learn $x=(x_1,\ldots,x_n) \in \{0,1\}^n$ from a stream of random linear equations over $\mathrm{F}_2$ that are correct with probability $\frac{1}{2}+\varepsilon$ and flipped with probability $\frac{1}{2}-\varepsilon$, that any learning algorithm requires either a memory of size $Ω(n^2/\varepsilon)$ or an exponential number of samples. In fact, we study memory-sample lower bounds for a large class of learning problems, as characterized by [GRT'18], when the samples are noisy. A matrix $M: A \times X \rightarrow \{-1,1\}$ corresponds to the following learning problem with error parameter $\varepsilon$: an unknown element $x \in X$ is chosen uniformly at random. A learner tries to learn $x$ from a stream of samples, $(a_1, b_1), (a_2, b_2) \ldots$, where for every $i$, $a_i \in A$ is chosen uniformly at random and $b_i = M(a_i,x)$ with probability $1/2+\varepsilon$ and $b_i = -M(a_i,x)$ with probability $1/2-\varepsilon$ ($0<\varepsilon< \frac{1}{2}$). Assume that $k,\ell, r$ are such that any submatrix of $M$ of at least $2^{-k} \cdot |A|$ rows and at least $2^{-\ell} \cdot |X|$ columns, has a bias of at most $2^{-r}$. We show that any learning algorithm for the learning problem corresponding to $M$, with error, requires either a memory of size at least $Ω\left(\frac{k \cdot \ell}{\varepsilon} \right)$, or at least $2^{Ω(r)}$ samples. In particular, this shows that for a large class of learning problems, same as those in [GRT'18], any learning algorithm requires either a memory of size at least $Ω\left(\frac{(\log |X|) \cdot (\log |A|)}{\varepsilon}\right)$ or an exponential number of noisy samples. Our proof is based on adapting the arguments in [Raz'17,GRT'18] to the noisy case.
CCFeb 17, 2020
Time-Space Tradeoffs for Distinguishing Distributions and Applications to Security of Goldreich's PRGSumegha Garg, Pravesh K. Kothari, Ran Raz
In this work, we establish lower-bounds against memory bounded algorithms for distinguishing between natural pairs of related distributions from samples that arrive in a streaming setting. In our first result, we show that any algorithm that distinguishes between uniform distribution on $\{0,1\}^n$ and uniform distribution on an $n/2$-dimensional linear subspace of $\{0,1\}^n$ with non-negligible advantage needs $2^{Ω(n)}$ samples or $Ω(n^2)$ memory. Our second result applies to distinguishing outputs of Goldreich's local pseudorandom generator from the uniform distribution on the output domain. Specifically, Goldreich's pseudorandom generator $G$ fixes a predicate $P:\{0,1\}^k \rightarrow \{0,1\}$ and a collection of subsets $S_1, S_2, \ldots, S_m \subseteq [n]$ of size $k$. For any seed $x \in \{0,1\}^n$, it outputs $P(x_{S_1}), P(x_{S_2}), \ldots, P(x_{S_m})$ where $x_{S_i}$ is the projection of $x$ to the coordinates in $S_i$. We prove that whenever $P$ is $t$-resilient (all non-zero Fourier coefficients of $(-1)^P$ are of degree $t$ or higher), then no algorithm, with $<n^ε$ memory, can distinguish the output of $G$ from the uniform distribution on $\{0,1\}^m$ with a large inverse polynomial advantage, for stretch $m \le \left(\frac{n}{t}\right)^{\frac{(1-ε)}{36}\cdot t}$ (barring some restrictions on $k$). The lower bound holds in the streaming model where at each time step $i$, $S_i\subseteq [n]$ is a randomly chosen (ordered) subset of size $k$ and the distinguisher sees either $P(x_{S_i})$ or a uniformly random bit along with $S_i$. Our proof builds on the recently developed machinery for proving time-space trade-offs (Raz 2016 and follow-ups) for search/learning problems.
LGAug 8, 2017
Extractor-Based Time-Space Lower Bounds for LearningSumegha Garg, Ran Raz, Avishay Tal
A matrix $M: A \times X \rightarrow \{-1,1\}$ corresponds to the following learning problem: An unknown element $x \in X$ is chosen uniformly at random. A learner tries to learn $x$ from a stream of samples, $(a_1, b_1), (a_2, b_2) \ldots$, where for every $i$, $a_i \in A$ is chosen uniformly at random and $b_i = M(a_i,x)$. Assume that $k,\ell, r$ are such that any submatrix of $M$ of at least $2^{-k} \cdot |A|$ rows and at least $2^{-\ell} \cdot |X|$ columns, has a bias of at most $2^{-r}$. We show that any learning algorithm for the learning problem corresponding to $M$ requires either a memory of size at least $Ω\left(k \cdot \ell \right)$, or at least $2^{Ω(r)}$ samples. The result holds even if the learner has an exponentially small success probability (of $2^{-Ω(r)}$). In particular, this shows that for a large class of learning problems, any learning algorithm requires either a memory of size at least $Ω\left((\log |X|) \cdot (\log |A|)\right)$ or an exponential number of samples, achieving a tight $Ω\left((\log |X|) \cdot (\log |A|)\right)$ lower bound on the size of the memory, rather than a bound of $Ω\left(\min\left\{(\log |X|)^2,(\log |A|)^2\right\}\right)$ obtained in previous works [R17,MM17b]. Moreover, our result implies all previous memory-samples lower bounds, as well as a number of new applications. Our proof builds on [R17] that gave a general technique for proving memory-samples lower bounds.
LGFeb 16, 2016
Fast Learning Requires Good Memory: A Time-Space Lower Bound for Parity LearningRan Raz
We prove that any algorithm for learning parities requires either a memory of quadratic size or an exponential number of samples. This proves a recent conjecture of Steinhardt, Valiant and Wager and shows that for some learning problems a large storage space is crucial. More formally, in the problem of parity learning, an unknown string $x \in \{0,1\}^n$ was chosen uniformly at random. A learner tries to learn $x$ from a stream of samples $(a_1, b_1), (a_2, b_2) \ldots$, where each~$a_t$ is uniformly distributed over $\{0,1\}^n$ and $b_t$ is the inner product of $a_t$ and $x$, modulo~2. We show that any algorithm for parity learning, that uses less than $\frac{n^2}{25}$ bits of memory, requires an exponential number of samples. Previously, there was no non-trivial lower bound on the number of samples needed, for any learning problem, even if the allowed memory size is $O(n)$ (where $n$ is the space needed to store one sample). We also give an application of our result in the field of bounded-storage cryptography. We show an encryption scheme that requires a private key of length $n$, as well as time complexity of $n$ per encryption/decription of each bit, and is provenly and unconditionally secure as long as the attacker uses less than $\frac{n^2}{25}$ memory bits and the scheme is used at most an exponential number of times. Previous works on bounded-storage cryptography assumed that the memory size used by the attacker is at most linear in the time needed for encryption/decription.