AISep 11, 2023
Errors are Robustly Tamed in Cumulative Knowledge ProcessesAnna Brandenberger, Cassandra Marcussen, Elchanan Mossel et al. · mit
We study processes of societal knowledge accumulation, where the validity of a new unit of knowledge depends both on the correctness of its derivation and on the validity of the units it depends on. A fundamental question in this setting is: If a constant fraction of the new derivations is wrong, can investing a constant fraction, bounded away from one, of effort ensure that a constant fraction of knowledge in society is valid? Ben-Eliezer, Mikulincer, Mossel, and Sudan (ITCS 2023) introduced a concrete probabilistic model to analyze such questions and showed an affirmative answer to this question. Their study, however, focuses on the simple case where each new unit depends on just one existing unit, and units attach according to a $\textit{preferential attachment rule}$. In this work, we consider much more general families of cumulative knowledge processes, where new units may attach according to varied attachment mechanisms and depend on multiple existing units. We also allow a (random) fraction of insertions of adversarial nodes. We give a robust affirmative answer to the above question by showing that for $\textit{all}$ of these models, as long as many of the units follow simple heuristics for checking a bounded number of units they depend on, all errors will be eventually eliminated. Our results indicate that preserving the quality of large interdependent collections of units of knowledge is feasible, as long as careful but not too costly checks are performed when new units are derived/deposited.
DSNov 14, 2025
Learning and Testing Convex FunctionsRenato Ferreira Pinto, Cassandra Marcussen, Elchanan Mossel et al.
We consider the problems of \emph{learning} and \emph{testing} real-valued convex functions over Gaussian space. Despite the extensive study of function convexity across mathematics, statistics, and computer science, its learnability and testability have largely been examined only in discrete or restricted settings -- typically with respect to the Hamming distance, which is ill-suited for real-valued functions. In contrast, we study these problems in high dimensions under the standard Gaussian measure, assuming sample access to the function and a mild smoothness condition, namely Lipschitzness. A smoothness assumption is natural and, in fact, necessary even in one dimension: without it, convexity cannot be inferred from finitely many samples. As our main results, we give: - Learning Convex Functions: An agnostic proper learning algorithm for Lipschitz convex functions that achieves error $\varepsilon$ using $n^{O(1/\varepsilon^2)}$ samples, together with a complementary lower bound of $n^{\mathrm{poly}(1/\varepsilon)}$ samples in the \emph{correlational statistical query (CSQ)} model. - Testing Convex Functions: A tolerant (two-sided) tester for convexity of Lipschitz functions with the same sample complexity (as a corollary of our learning result), and a one-sided tester (which never rejects convex functions) using $O(\sqrt{n}/\varepsilon)^n$ samples.
DSFeb 17, 2023
Uniformity Testing over Hypergrids with Subcube ConditioningXi Chen, Cassandra Marcussen
We give an algorithm for testing uniformity of distributions supported on hypergrids $[m_1] \times \cdots \times [m_n]$, which makes $\smash{\widetilde{O}(\text{poly}(m)\sqrt{n}/ε^2)}$ many queries to a subcube conditional sampling oracle with $m=\max_i m_i$. When $m$ is a constant, our algorithm is nearly optimal and strengthens the algorithm of [CCK+21] which has the same query complexity but works for hypercubes $\{\pm 1\}^n$ only. A key technical contribution behind the analysis of our algorithm is a proof of a robust version of Pisier's inequality for functions over hypergrids using Fourier analysis.
DSAug 22, 2025
Quality control in sublinear time: a case study via random graphsCassandra Marcussen, Ronitt Rubinfeld, Madhu Sudan
Many algorithms are designed to work well on average over inputs. When running such an algorithm on an arbitrary input, we must ask: Can we trust the algorithm on this input? We identify a new class of algorithmic problems addressing this, which we call "Quality Control Problems." These problems are specified by a (positive, real-valued) "quality function" $ρ$ and a distribution $D$ such that, with high probability, a sample drawn from $D$ is "high quality," meaning its $ρ$-value is near $1$. The goal is to accept inputs $x \sim D$ and reject potentially adversarially generated inputs $x$ with $ρ(x)$ far from $1$. The objective of quality control is thus weaker than either component problem: testing for "$ρ(x) \approx 1$" or testing if $x \sim D$, and offers the possibility of more efficient algorithms. In this work, we consider the sublinear version of the quality control problem, where $D \in Δ(\{0,1\}^N)$ and the goal is to solve the $(D ,ρ)$-quality problem with $o(N)$ queries and time. As a case study, we consider random graphs, i.e., $D = G_{n,p}$ (and $N = \binom{n}2$), and the $k$-clique count function $ρ_k := C_k(G)/\mathbb{E}_{G' \sim G_{n,p}}[C_k(G')]$, where $C_k(G)$ is the number of $k$-cliques in $G$. Testing if $G \sim G_{n,p}$ with one sample, let alone with sublinear query access to the sample, is of course impossible. Testing if $ρ_k(G)\approx 1$ requires $p^{-Ω(k^2)}$ samples. In contrast, we show that the quality control problem for $G_{n,p}$ (with $n \geq p^{-ck}$ for some constant $c$) with respect to $ρ_k$ can be tested with $p^{-O(k)}$ queries and time, showing quality control is provably superpolynomially more efficient in this setting. More generally, for a motif $H$ of maximum degree $Δ(H)$, the respective quality control problem can be solved with $p^{-O(Δ(H))}$ queries and running time.