AISep 11, 2023
Errors are Robustly Tamed in Cumulative Knowledge ProcessesAnna Brandenberger, Cassandra Marcussen, Elchanan Mossel et al. · mit
We study processes of societal knowledge accumulation, where the validity of a new unit of knowledge depends both on the correctness of its derivation and on the validity of the units it depends on. A fundamental question in this setting is: If a constant fraction of the new derivations is wrong, can investing a constant fraction, bounded away from one, of effort ensure that a constant fraction of knowledge in society is valid? Ben-Eliezer, Mikulincer, Mossel, and Sudan (ITCS 2023) introduced a concrete probabilistic model to analyze such questions and showed an affirmative answer to this question. Their study, however, focuses on the simple case where each new unit depends on just one existing unit, and units attach according to a $\textit{preferential attachment rule}$. In this work, we consider much more general families of cumulative knowledge processes, where new units may attach according to varied attachment mechanisms and depend on multiple existing units. We also allow a (random) fraction of insertions of adversarial nodes. We give a robust affirmative answer to the above question by showing that for $\textit{all}$ of these models, as long as many of the units follow simple heuristics for checking a bounded number of units they depend on, all errors will be eventually eliminated. Our results indicate that preserving the quality of large interdependent collections of units of knowledge is feasible, as long as careful but not too costly checks are performed when new units are derived/deposited.
100.0CCApr 2
Linear Space Streaming Lower Bounds for Approximating CSPsChi-Ning Chou, Alexander Golovnev, Madhu Sudan et al.
We consider the approximability of constraint satisfaction problems in the streaming setting. For every constraint satisfaction problem (CSP) on $n$ variables taking values in $\{0,\ldots,q-1\}$, we prove that improving over the trivial approximability by a factor of $q$ requires $Ω(n)$ space even on instances with $O(n)$ constraints. We also identify a broad subclass of problems for which any improvement over the trivial approximability requires $Ω(n)$ space. The key technical core is an optimal, $q^{-(k-1)}$-inapproximability for the Max $k$-LIN-$\bmod\; q$ problem, which is the Max CSP problem where every constraint is given by a system of $k-1$ linear equations $\bmod\; q$ over $k$ variables. Our work builds on and extends the breakthrough work of Kapralov and Krachun (Proc. STOC 2019) who showed a linear lower bound on any non-trivial approximation of the MaxCut problem in graphs. MaxCut corresponds roughly to the case of Max $k$-LIN-$\bmod\; q$ with ${k=q=2}$. For general CSPs in the streaming setting, prior results only yielded $Ω(\sqrt{n})$ space bounds. In particular no linear space lower bound was known for an approximation factor less than $1/2$ for any CSP. Extending the work of Kapralov and Krachun to Max $k$-LIN-$\bmod\; q$ to $k>2$ and $q>2$ (while getting optimal hardness results) is the main technical contribution of this work. Each one of these extensions provides non-trivial technical challenges that we overcome in this work.
DSFeb 17
Markov Chains with RewindingAmir Azarmehr, Soheil Behnezhad, Alma Ghafari et al.
Motivated by techniques developed in recent progress on lower bounds for sublinear time algorithms (Behnezhad, Roghani and Rubinstein, STOC 2023, FOCS 2023, and STOC 2024) we introduce and study a new class of randomized algorithmic processes that we call Markov Chains with Rewinding. In this setting, an algorithm interacts with a (partially observable) Markovian random evolution by strategically rewinding the Markov chain to previous states. Depending on the application, this may lead the evolution to desired states faster, or allow the agent to efficiently learn or test properties of the underlying Markov chain that may be infeasible or inefficient with passive observation. We study the task of identifying the initial state in a given partially observable Markov chain. Analysis of this question in specific Markov chains is the central ingredient in the above cited works and we aim to systematize the analysis in our work. Our first result is that any pair of states distinguishable with any rewinding strategy can also be distinguished with a non-adaptive rewinding strategy (one whose rewinding choices are determined before observing any outcomes of the chain). Therefore, while rewinding strategies can be shown to be strictly more powerful than passive strategies (those that do not rewind back to previous states), adaptivity does not give additional power to a rewinding strategy in the absence of efficiency considerations. The difference becomes apparent however when we introduce a natural efficiency measure, namely the query complexity (i.e., the number of observations they need to identify distinguishable states). Our second main contribution is to quantify this efficiency gap. We present a non-adaptive rewinding strategy whose query complexity is within a polynomial of that of the optimal (adaptive) strategy, and show that such a polynomial loss is necessary in general.
67.4CCMar 24
Ideals, Macaulay Bases, and PCPsPrashanth Amireddy, Amik Raj Behera, Srikanth Srinivasan et al.
All known proofs of the PCP theorem rely on multiple "composition" steps, where PCPs over large alphabets are turned into PCPs over much smaller alphabets at a (relatively) small price in the soundness error of the PCP. Algebraic proofs, starting with the work of Arora, Lund, Motwani, Sudan, and Szegedy use at least 2 such composition steps, whereas the "Gap amplification" proof of Dinur uses $Î(\log n)$ such composition steps. In this work, we present the first PCP construction using just one composition step. The key ingredient, missing in previous work and finally supplied in this paper, is a basic PCP (of Proximity) of size $2^{n^ε}$, for any $ε> 0$, that makes $O_ε(1)$ queries. At the core of our new construction is a new class of alternatives to "sum-check" protocols. As used in past PCPs, these provide a method by which to verify that an $m$-variate degree $d$ polynomial $P$ evaluates to zero at every point of some set $S \subseteq \mathbb{F}_q^m$. Previous works had shown how to check this condition for sets of the form $S = H^m$ using $O(m)$ queries with alphabet $\mathbb{F}_q^d$ assuming $d \geq |H|$. Our work improves this basic protocol in two ways: First we extend it to broader classes of sets $S$ (ones closer to Hamming balls rather than cubes). Second, it reduces the number of queries from $O(m)$ to an absolute constant for the settings of $S$ we consider. Specifically when $S = (\{0,1\}^{m/c}_{\leq 1})^c$, we give such an alternate to the sum-check protocol with $O(1)$ queries with alphabet $\mathbb{F}_q^{O(c+d)}$, using proofs of size $q^{O(m^2/c)}$. Our new protocols use the notion of Macaulay bases to extend previously known protocols to these new settings with surprising ease. In doing so, they highlight why these notions from algebra may be of further use in complexity theory.
16.3CCApr 5
Expanders Meet Reed--Muller: Easy Instances of Noisy k-XORJarosÅaw BÅasiok, Paul Lou, Alon Rosen et al.
In the noisy $k$-XOR problem, one is given $y \in \mathbb{F}_2^M$ and must distinguish between $y$ uniform and $y = A x + e$, where $A$ is the adjacency matrix of a $k$-left-regular bipartite graph with $N$ variables and $M$ constraints, $x\in \mathbb{F}_2^N$ is random, and $e$ is noise with rate $η$. Lower bounds in restricted computational models such as Sum-of-Squares and low-degree polynomials are closely tied to the expansion of $A$, leading to conjectures that expansion implies hardness. We show that such conjectures are false by constructing an explicit family of graphs with near-optimal expansion for which noisy $k$-XOR is solvable in polynomial time. Our construction combines two powerful directions of work in pseudorandomness and coding theory that have not been previously put together. Specifically, our graphs are based on the lossless expanders of Guruswami, Umans and Vadhan (JACM 2009). Our key insight is that by an appropriate interpretation of the vertices of their graphs, the noisy XOR problem turns into the problem of decoding Reed-Muller codes from random errors. Then we build on a powerful body of work from the 2010s correcting from large amounts of random errors. Putting these together yields our construction. Concretely, we obtain explicit families for which noisy $k$-XOR is polynomial-time solvable at constant noise rate $η= 1/3$ for graphs with $M = 2^{O(\log^2 N)}$, $k = (\log N)^{O(1)}$, and $(N^{1-α}, 1-o(1))$-expansion. Under standard conjectures on Reed--Muller codes over the binary erasure channel, this extends to families with $M = N^{O(1)}$, $k=(\log N)^{O(1)}$, expansion $(N^{1-α}, 1-o(1))$ and polynomial-time algorithms at noise rate $η= N^{-c}$.
DSAug 22, 2025
Quality control in sublinear time: a case study via random graphsCassandra Marcussen, Ronitt Rubinfeld, Madhu Sudan
Many algorithms are designed to work well on average over inputs. When running such an algorithm on an arbitrary input, we must ask: Can we trust the algorithm on this input? We identify a new class of algorithmic problems addressing this, which we call "Quality Control Problems." These problems are specified by a (positive, real-valued) "quality function" $ρ$ and a distribution $D$ such that, with high probability, a sample drawn from $D$ is "high quality," meaning its $ρ$-value is near $1$. The goal is to accept inputs $x \sim D$ and reject potentially adversarially generated inputs $x$ with $ρ(x)$ far from $1$. The objective of quality control is thus weaker than either component problem: testing for "$ρ(x) \approx 1$" or testing if $x \sim D$, and offers the possibility of more efficient algorithms. In this work, we consider the sublinear version of the quality control problem, where $D \in Δ(\{0,1\}^N)$ and the goal is to solve the $(D ,ρ)$-quality problem with $o(N)$ queries and time. As a case study, we consider random graphs, i.e., $D = G_{n,p}$ (and $N = \binom{n}2$), and the $k$-clique count function $ρ_k := C_k(G)/\mathbb{E}_{G' \sim G_{n,p}}[C_k(G')]$, where $C_k(G)$ is the number of $k$-cliques in $G$. Testing if $G \sim G_{n,p}$ with one sample, let alone with sublinear query access to the sample, is of course impossible. Testing if $ρ_k(G)\approx 1$ requires $p^{-Ω(k^2)}$ samples. In contrast, we show that the quality control problem for $G_{n,p}$ (with $n \geq p^{-ck}$ for some constant $c$) with respect to $ρ_k$ can be tested with $p^{-O(k)}$ queries and time, showing quality control is provably superpolynomially more efficient in this setting. More generally, for a motif $H$ of maximum degree $Δ(H)$, the respective quality control problem can be solved with $p^{-O(Δ(H))}$ queries and running time.
PRFeb 1, 2014
Performance of the Survey Propagation-guided decimation algorithm for the random NAE-K-SAT problemDavid Gamarnik, Madhu Sudan
We show that the Survey Propagation-guided decimation algorithm fails to find satisfying assignments on random instances of the "Not-All-Equal-$K$-SAT" problem if the number of message passing iterations is bounded by a constant independent of the size of the instance and the clause-to-variable ratio is above $(1+o_K(1)){2^{K-1}\over K}\log^2 K$ for sufficiently large $K$. Our analysis in fact applies to a broad class of algorithms described as "sequential local algorithms". Such algorithms iteratively set variables based on some local information and then recurse on the reduced instance. Survey Propagation-guided as well as Belief Propagation-guided decimation algorithms - two widely studied message passing based algorithms, fall under this category of algorithms provided the number of message passing iterations is bounded by a constant. Another well-known algorithm falling into this category is the Unit Clause algorithm. Our work constitutes the first rigorous analysis of the performance of the SP-guided decimation algorithm. The approach underlying our paper is based on an intricate geometry of the solution space of random NAE-$K$-SAT problem. We show that above the $(1+o_K(1)){2^{K-1}\over K}\log^2 K$ threshold, the overlap structure of $m$-tuples of satisfying assignments exhibit a certain clustering behavior expressed in the form of constraints on distances between the $m$ assignments, for appropriately chosen $m$. We further show that if a sequential local algorithm succeeds in finding a satisfying assignment with probability bounded away from zero, then one can construct an $m$-tuple of solutions violating these constraints, thus leading to a contradiction. Along with (citation), this result is the first work which directly links the clustering property of random constraint satisfaction problems to the computational hardness of finding satisfying assignments.