Venkatesan Guruswami

CC
h-index56
11papers
241citations
Novelty61%
AI Score56

11 Papers

DSMay 15
Redundancy Is All You Need (for CSP Sparsification)

Joshua Brakensiek, Venkatesan Guruswami

The seminal work of Benczúr and Karger demonstrated cut sparsifiers of near-linear size. Subsequent extensions have yielded sparsifiers for hypergraph cuts and more recently linear codes over Abelian groups. A decade ago, Kogan and Krauthgamer asked about the sparsifiability of arbitrary constraint satisfaction problems (CSPs). For this question, a trivial lower bound is the size of a non-redundant CSP instance, which admits, for each constraint, an assignment satisfying only that constraint (so that no constraint can be dropped by the sparsifier). For instance, for graph cuts, spanning trees are non-redundant instances. Our main result is that redundant clauses are sufficient for sparsification: for any CSP predicate R, every unweighted instance of CSP(R) has a sparsifier of size at most its non-redundancy (up to polylog and $1/ε$ factors). For weighted instances, we similarly pin down the sparsifiability to the so-called chain length of the predicate. These results precisely determine the extent to which any CSP can be sparsified. Our result is established in the general setting of non-linear codes, or equivalently set families, yielding a VC-type theorem for multiplicative error approximation. A key technical ingredient in our work is a novel application of the entropy method from Gilmer's recent breakthrough on the union-closed sets conjecture. As an immediate consequence of our main theorem, a number of results in the non-redundancy literature immediately extend to CSP sparsification. We also contribute new techniques for understanding the non-redundancy of CSP predicates. By adapting methods from the matching vector codes literature in coding theory, we are able to construct an explicit predicate whose non-redundancy lies between $Ω(n^{1.5})$ and $\widetilde{O}(n^{1.6})$, the first example with a provably non-integral exponent.

CCMar 22
Classification of Non-redundancy of Boolean Predicates of Arity 4

Joshua Brakensiek, Venkatesan Guruswami, Aaron Putterman

Given a constraint satisfaction problem (CSP) predicate $P \subseteq D^r$, the non-redundancy (NRD) of $P$ is maximum-sized instance on $n$ variables such that for every clause of the instance, there is an assignment which satisfies all but that clause. The study of NRD for various CSPs is an active area of research which combines ideas from extremal combinatorics, logic, lattice theory, and other techniques. Complete classifications are known in the cases $r=2$ and $(|D|=2, r=3)$. In this paper, we give a near-complete classification of the case $(|D|=2, r=4)$. Of the 400 distinct non-trivial Boolean predicates of arity 4, we implement an algorithmic procedure which perfectly classifies 397 of them. Of the remaining three, we solve two by reducing to extremal combinatorics problems -- leaving the last one as an open question. Along the way, we identify the first Boolean predicate whose non-redundancy asymptotics are non-polynomial.

ITApr 16
Explicit Constant-Alphabet Subspace Design Codes

Rohan Goyal, Venkatesan Guruswami, Jun-Ting Hsieh

The subspace design property for additive codes is a higher-dimensional generalization of the minimum distance property. As shown recently by Brakensiek, Chen, Dhar and Zhang, it implies that the code has similar performance as random linear codes with respect to all "local properties". Explicit algebraic codes, such as folded Reed-Solomon and multiplicity codes, are known to have the subspace design property, but they need alphabet sizes that grow as a large polynomial in the block length. Constructing explicit constant-alphabet subspace design codes was subsequently posed as an open question in Brakensiek, Chen, Dhar and Zhang. In this work, we answer their question and give explicit constructions of subspace design codes over constant-sized alphabets, using the expander-based Alon-Edmonds-Luby (AEL) framework. This generalizes the recent work of Jeronimo and Shagrithaya, which showed that such codes share local properties of random linear codes. Our work obtains this consequence in a unified manner via the subspace design property. In addition, our approach yields some improvements in parameters for list-recovery.

DMMay 18
Super-linear Lower Bounds for CSP Non-Redundancy via Shrinking Instances

Joshua Brakensiek, Venkatesan Guruswami, Bart M. P. Jansen et al.

The non-redundancy (NRD) of a constraint satisfaction problem (CSP) is a combinatorial quantity closely tied to the behavior of CSPs in various computational models including their sparsification, kernelization, and streaming complexity. A primary open question in the study of non-redundancy is the identification of which CSP predicates have near-linear NRD. Recent works by Carbonnel [CP 2022], Khanna, Putterman and Sudan [STOC 2025], Brakensiek and Guruswami [STOC 2025] and Brakensiek, Guruswami, Jansen, Lagerkvist, and Wahlström [2025] have introduced various forms of gadget reductions between CSPs to relate their non-redundancy. The primary contribution of this work is to recontextualize many of these gadget reductions in a framework which we call hypergraph projections. By studying a quantity we call the shrinking factor of these hypergraph projections, we can more precisely predict when a gadget reduction between predicates can yield a super-linear NRD lower bound, greatly improving on the analysis of previous works. To illustrate the power of our framework, we identify some concrete CSP predicates whose non-redundancy is at the cusp of our understanding and show how our methods give lower bounds that could not have been achieved with these previous methods. We also demonstrate how these gadget reductions can be automatically deduced using SAT solvers, thereby opening up novel computational avenues for discovering further relationships between the non-redundancy of various CSPs.

CCMay 12
Strong Inapproximability for a Promise Rank Problem

Venkatesan Guruswami, Xuandi Ren, Shaoxuan Tang

Given a linear subspace of $n \times n$ matrices over $\mathbb F_{2^r}$ that is promised to contain a matrix of rank $1$, we prove that it is hard to find a matrix of rank $n^{o(1/\log \log n)}$, assuming NP doesn't have sub-exponential algorithms. In addition to being a basic problem, the hardness of this problem, even for the exact version, drove recent PCP-free inapproximability results for minimum distance and shortest vector problems concerning codes and lattices. The proof combines the concept of superposition soundness introduced by Khot and Saket with moment matrices. To produce a rank-gap of $1$ vs. $k$, the reduction runs in time $n^{O(\log k)}$. We also give another moment-matrix-based construction which runs in time $n^{O(k)}$ but works for any finite field $\mathbb F_q$.

CCMar 28, 2024
Hardness of Learning Boolean Functions from Label Proportions

Venkatesan Guruswami, Rishi Saket

In recent years the framework of learning from label proportions (LLP) has been gaining importance in machine learning. In this setting, the training examples are aggregated into subsets or bags and only the average label per bag is available for learning an example-level predictor. This generalizes traditional PAC learning which is the special case of unit-sized bags. The computational learning aspects of LLP were studied in recent works (Saket, NeurIPS'21; Saket, NeurIPS'22) which showed algorithms and hardness for learning halfspaces in the LLP setting. In this work we focus on the intractability of LLP learning Boolean functions. Our first result shows that given a collection of bags of size at most $2$ which are consistent with an OR function, it is NP-hard to find a CNF of constantly many clauses which satisfies any constant-fraction of the bags. This is in contrast with the work of (Saket, NeurIPS'21) which gave a $(2/5)$-approximation for learning ORs using a halfspace. Thus, our result provides a separation between constant clause CNFs and halfspaces as hypotheses for LLP learning ORs. Next, we prove the hardness of satisfying more than $1/2 + o(1)$ fraction of such bags using a $t$-DNF (i.e. DNF where each term has $\leq t$ literals) for any constant $t$. In usual PAC learning such a hardness was known (Khot-Saket, FOCS'08) only for learning noisy ORs. We also study the learnability of parities and show that it is NP-hard to satisfy more than $(q/2^{q-1} + o(1))$-fraction of $q$-sized bags which are consistent with a parity using a parity, while a random parity based algorithm achieves a $(1/2^{q-2})$-approximation.

DSMar 14, 2024
Outlier Robust Multivariate Polynomial Regression

Vipul Arora, Arnab Bhattacharyya, Mathews Boban et al.

We study the problem of robust multivariate polynomial regression: let $p\colon\mathbb{R}^n\to\mathbb{R}$ be an unknown $n$-variate polynomial of degree at most $d$ in each variable. We are given as input a set of random samples $(\mathbf{x}_i,y_i) \in [-1,1]^n \times \mathbb{R}$ that are noisy versions of $(\mathbf{x}_i,p(\mathbf{x}_i))$. More precisely, each $\mathbf{x}_i$ is sampled independently from some distribution $χ$ on $[-1,1]^n$, and for each $i$ independently, $y_i$ is arbitrary (i.e., an outlier) with probability at most $ρ< 1/2$, and otherwise satisfies $|y_i-p(\mathbf{x}_i)|\leqσ$. The goal is to output a polynomial $\hat{p}$, of degree at most $d$ in each variable, within an $\ell_\infty$-distance of at most $O(σ)$ from $p$. Kane, Karmalkar, and Price [FOCS'17] solved this problem for $n=1$. We generalize their results to the $n$-variate setting, showing an algorithm that achieves a sample complexity of $O_n(d^n\log d)$, where the hidden constant depends on $n$, if $χ$ is the $n$-dimensional Chebyshev distribution. The sample complexity is $O_n(d^{2n}\log d)$, if the samples are drawn from the uniform distribution instead. The approximation error is guaranteed to be at most $O(σ)$, and the run-time depends on $\log(1/σ)$. In the setting where each $\mathbf{x}_i$ and $y_i$ are known up to $N$ bits of precision, the run-time's dependence on $N$ is linear. We also show that our sample complexities are optimal in terms of $d^n$. Furthermore, we show that it is possible to have the run-time be independent of $1/σ$, at the cost of a higher sample complexity.

CRFeb 17, 2019
Leakage-Resilient Non-Malleable Secret Sharing in Non-compartmentalized Models

Fuchun Lin, Mahdi Cheraghchi, Venkatesan Guruswami et al.

Non-malleable secret sharing was recently proposed by Goyal and Kumar in independent tampering and joint tampering models for threshold secret sharing (STOC18) and secret sharing with general access structure (CRYPTO18). The idea of making secret sharing non-malleable received great attention and by now has generated many papers exploring new frontiers in this topic, such as multiple-time tampering and adding leakage resiliency to the one-shot tampering model. Non-compartmentalized tampering model was first studied by Agrawal et.al (CRYPTO15) for non-malleability against permutation composed with bit-wise independent tampering, and shown useful in constructing non-malleable string commitments. We initiate the study of leakage-resilient secret sharing in the non-compartmentalized model. The leakage adversary can corrupt several players and obtain their shares, as in normal secret sharing. The leakage adversary can apply arbitrary affine functions with bounded total output length to the full share vector and obtain the outputs as leakage. These two processes can be both non-adaptive and do not depend on each other, or both adaptive and depend on each other with arbitrary ordering. We construct such leakage-resilient secret sharing schemes and achieve constant information ratio (the scheme for non-adaptive adversary is near optimal). We then explore making the non-compartmentalized leakage-resilient secret sharing also non-malleable against tampering. We consider a tampering model, where the adversary can use the shares obtained from the corrupted players and the outputs of the global leakage functions to choose a tampering function from a tampering family F. We give two constructions of such leakage-resilient non-malleable secret sharing for the case F is the bit-wise independent tampering and, respectively, for the case F is the affine tampering functions.

CRAug 9, 2018
Secret Sharing with Binary Shares

Fuchun Lin, Mahdi Cheraghchi, Venkatesan Guruswami et al.

Shamir's celebrated secret sharing scheme provides an efficient method for encoding a secret of arbitrary length $\ell$ among any $N \leq 2^\ell$ players such that for a threshold parameter $t$, (i) the knowledge of any $t$ shares does not reveal any information about the secret and, (ii) any choice of $t+1$ shares fully reveals the secret. It is known that any such threshold secret sharing scheme necessarily requires shares of length $\ell$, and in this sense Shamir's scheme is optimal. The more general notion of ramp schemes requires the reconstruction of secret from any $t+g$ shares, for a positive integer gap parameter $g$. Ramp secret sharing scheme necessarily requires shares of length $\ell/g$. Other than the bound related to secret length $\ell$, the share lengths of ramp schemes can not go below a quantity that depends only on the gap ratio $g/N$. In this work, we study secret sharing in the extremal case of bit-long shares and arbitrarily small gap ratio $g/N$, where standard ramp secret sharing becomes impossible. We show, however, that a slightly relaxed but equally effective notion of semantic security for the secret, and negligible reconstruction error probability, eliminate the impossibility. Moreover, we provide explicit constructions of such schemes. One of the consequences of our relaxation is that, unlike standard ramp schemes with perfect secrecy, adaptive and non-adaptive adversaries need different analysis and construction. For non-adaptive adversaries, we explicitly construct secret sharing schemes that provide secrecy against any $τ$ fraction of observed shares, and reconstruction from any $ρ$ fraction of shares, for any choices of $0 \leq τ< ρ\leq 1$. Our construction achieves secret length $N(ρ-τ-o(1))$, which we show to be optimal. For adaptive adversaries, we construct explicit schemes attaining a secret length $Ω(N(ρ-τ))$.

ITSep 4, 2013
Non-Malleable Coding Against Bit-wise and Split-State Tampering

Mahdi Cheraghchi, Venkatesan Guruswami

Non-malleable coding, introduced by Dziembowski, Pietrzak and Wichs (ICS 2010), aims for protecting the integrity of information against tampering attacks in situations where error-detection is impossible. Intuitively, information encoded by a non-malleable code either decodes to the original message or, in presence of any tampering, to an unrelated message. Dziembowski et al. show existence of non-malleable codes for any class of tampering functions of bounded size. We consider constructions of coding schemes against two well-studied classes of tampering functions: bit-wise tampering functions (where the adversary tampers each bit of the encoding independently) and split-state adversaries (where two independent adversaries arbitrarily tamper each half of the encoded sequence). 1. For bit-tampering, we obtain explicit and efficiently encodable and decodable codes of length $n$ achieving rate $1-o(1)$ and error (security) $\exp(-\tildeΩ(n^{1/7}))$. We improve the error to $\exp(-\tildeΩ(n))$ at the cost of making the construction Monte Carlo with success probability $1-\exp(-Ω(n))$. Previously, the best known construction of bit-tampering codes was the Monte Carlo construction of Dziembowski et al. (ICS 2010) achieving rate ~.1887. 2. We initiate the study of seedless non-malleable extractors as a variation of non-malleable extractors introduced by Dodis and Wichs (STOC 2009). We show that construction of non-malleable codes for the split-state model reduces to construction of non-malleable two-source extractors. We prove existence of such extractors, which implies that codes obtained from our reduction can achieve rates arbitrarily close to 1/5 and exponentially small error. Currently, the best known explicit construction of split-state coding schemes is due to Aggarwal, Dodis and Lovett (ECCC TR13-081) which only achieves vanishing (polynomially small) rate.

ITSep 2, 2013
Capacity of Non-Malleable Codes

Mahdi Cheraghchi, Venkatesan Guruswami

Non-malleable codes, introduced by Dziembowski, Pietrzak and Wichs (ICS 2010), encode messages $s$ in a manner so that tampering the codeword causes the decoder to either output $s$ or a message that is independent of $s$. While this is an impossible goal to achieve against unrestricted tampering functions, rather surprisingly non-malleable coding becomes possible against every fixed family $F$ of tampering functions that is not too large (for instance, when $|F| \le \exp(2^{αn})$ for some $α\in [0, 1)$ where $n$ is the number of bits in a codeword). In this work, we study the "capacity of non-malleable coding", and establish optimal bounds on the achievable rate as a function of the family size, answering an open problem from Dziembowski et al. (ICS 2010). Specifically, 1. We prove that for every family $F$ with $|F| \le \exp(2^{αn})$, there exist non-malleable codes against $F$ with rate arbitrarily close to $1-α$ (this is achieved w.h.p. by a randomized construction). 2. We show the existence of families of size $\exp(n^{O(1)} 2^{αn})$ against which there is no non-malleable code of rate $1-α$ (in fact this is the case w.h.p for a random family of this size). 3. We also show that $1-α$ is the best achievable rate for the family of functions which are only allowed to tamper the first $αn$ bits of the codeword, which is of special interest. As a corollary, this implies that the capacity of non-malleable coding in the split-state model (where the tampering function acts independently but arbitrarily on the two halves of the codeword) equals 1/2. We also give an efficient Monte Carlo construction of codes of rate close to 1 with polynomial time encoding and decoding that is non-malleable against any fixed $c > 0$ and family $F$ of size $\exp(n^c)$, in particular tampering functions with, say, cubic size circuits.