Amit Sahai

CR
h-index2
12papers
657citations
Novelty71%
AI Score63

12 Papers

ITMay 28
List Recovery for Random Low-Rate Linear Codes

Isaac M Hair, Amit Sahai

We prove a list recovery guarantee for random low-rate linear codes over sufficiently large prime fields. For fixed dimension $d$, error fraction $α$, and accuracy parameter $\varepsilon$, a random $d$-dimensional linear code $C \subseteq \mathbb{F}_p^n$ is, with high probability, $(α,\ell,\frac{1+\varepsilon}{1-α}\ell)$-list recoverable simultaneously for all input list sizes $\ell\le 2^{O_{α, \varepsilon, d}(n/\log n)}$. The proof is inspired by work of Matoušek, Př\'ıvětivý, and Škovroň on reconstructing point sets from their projections. It combines a deterministic graph-theoretic certificate, a nonvanishing determinant criterion, and the Schwartz--Zippel lemma. We also give a lower bound showing that any linear code $C \subseteq \mathbb{F}_p^n$ of dimension at least two cannot be $(α,\ell,\frac{1+\varepsilon}{1-α}\ell)$-list recoverable for feasible list sizes $\ell \geq 2^{Ω_{α, \varepsilon}(n)}$. In this sense, our result is nearly optimal.

CLMay 6
SLAM: Structural Linguistic Activation Marking for Language Models

Fabrice Harel-Canada, Amit Sahai

LLM watermarks must be detectable without compromising text quality, yet most existing schemes bias the next-token distribution and pay for detection with measurable quality loss. We present SLAM (Structural Linguistic Activation Marking), a novel white-box watermarking scheme that sidesteps this cost by writing the mark into structural geometry rather than token frequencies: sparse autoencoders identify residual-stream directions encoding linguistic structure (e.g., voice, tense, clause order), and we causally steer those directions at generation time, leaving lexical sampling and semantics unconstrained. On Gemma-2 2B and 9B, SLAM achieves 100% detection accuracy with a quality cost of only 1-2 reward points - compared to 7.5-11.5 for KGW, EWD, and Unigram - with naturalness and diversity preserved at near-unwatermarked levels across both models. The trade-off is a complementary robustness profile: SLAM resists word-level edits but is vulnerable to paraphrase that restructures syntax (at a quality cost), the converse of token-distribution methods.

CRApr 12
Public Key Encryption from High-Corruption Constraint Satisfaction Problems

Isaac M Hair, Amit Sahai

We give a public key encryption scheme with plausible quasi-exponential security based on the conjectured intractability of two constraint satisfaction problems (CSPs), both of which are instantiated with a corruption rate of $1 - o(1)$. First, we conjecture the hardness of a new large alphabet random predicate CSP (LARP-CSP) defined over an arbitrary but strongly expanding factor graph, where the vast majority of predicate outputs are replaced with random outputs. Second, we conjecture the hardness of the standard $k$XOR problem defined over a random factor graph, again where the vast majority of parity computations are replaced with random bits. In support of our hardness conjecture for LARP-CSPs, we give a variety of lower bounds, ruling out many natural attacks including all known attacks that exploit non-random factor graphs. Our public key encryption scheme is the first to leverage high corruption CSPs while simultaneously achieving a plausible security level far above quasi-polynomial. At the heart of our work is a new method for planting cryptographic trapdoors based on the label extended factor graph for a CSP. Along the way to achieving our result, we give the first uniform construction of an error-correcting code that has an expanding, low density generator matrix while simultaneously allowing for efficient decoding from a $1 - o(1)$ fraction of corruptions.

CCApr 1
Deterministic Hardness of Approximation For SVP in all Finite $\ell_p$ Norms

Isaac M Hair, Amit Sahai

We show that, assuming NP $\not\subseteq$ $\cap_{δ> 0}$DTIME$\left(\exp{n^δ}\right)$, the shortest vector problem for lattices of rank $n$ in any finite $\ell_p$ norm is hard to approximate within a factor of $2^{(\log n)^{1 - o(1)}}$, via a deterministic reduction. Previously, for the Euclidean case $p=2$, even hardness of the exact shortest vector problem was not known under a deterministic reduction.

CCMar 29
SVP$_p$ is Deterministically NP-Hard for all $p > 2$, Even to Approximate Within a Factor of $2^{\log^{1-\varepsilon} n}$

Isaac M. Hair, Amit Sahai

We prove that SVP$_p$ is NP-hard to approximate within a factor of $2^{\log^{1 - \varepsilon} n}$, for all constants $\varepsilon > 0$ and $p > 2$, under standard deterministic Karp reductions. This result is also the first proof that \emph{exact} SVP$_p$ is NP-hard in a finite $\ell_p$ norm. Hardness for SVP$_p$ with $p$ finite was previously only known if NP $\not \subseteq$ RP, and under that assumption, hardness of approximation was only known for all constant factors. As a corollary to our main theorem, we show that under the Sliding Scale Conjecture, SVP$_p$ is NP-hard to approximate within a small polynomial factor, for all constants $p > 2$. Our proof techniques are surprisingly elementary; we reduce from a \emph{regularized} PCP instance directly to the shortest vector problem by using simple gadgets related to Vandermonde matrices and Hadamard matrices.

LGMar 13
TaoBench: Do Automated Theorem Prover LLMs Generalize Beyond MathLib?

Alexander K Taylor, Junyi Zhang, Ethan Ji et al.

Automated theorem proving (ATP) benchmarks largely consist of problems formalized in MathLib, so current ATP training and evaluation are heavily biased toward MathLib's definitional framework. However, frontier mathematics is often exploratory and prototype-heavy, relying on bespoke constructions that deviate from standard libraries. In this work, we evaluate the robustness of current ATP systems when applied to a novel definitional framework, specifically examining the performance gap between standard library problems and bespoke mathematical constructions. We introduce TaoBench, an undergraduate-level benchmark derived from Terence Tao's Analysis I, which formalizes analysis by constructing core mathematical concepts from scratch, without relying on standard Mathlib definitions, as well as by mixing from-scratch and MathLib constructions. For fair evaluation, we build an agentic pipeline that automatically extracts a compilable, self-contained local environment for each problem. To isolate the effect of definitional frameworks, we additionally translate every problem into a mathematically equivalent Mathlib formulation, yielding paired TaoBench-Mathlib statements for direct comparison. While state-of-the-art ATP models perform capably within the MathLib framework, performance drops by an average of roughly 26% on the definitionally equivalent Tao formulation. This indicates that the main bottleneck is limited generalization across definitional frameworks rather than task difficulty. TaoBench thus highlights a gap between benchmark performance and applicability, and provides a concrete foundation for developing and testing provers better aligned with research mathematics.

CRMay 11, 2025
Sandcastles in the Storm: Revisiting the (Im)possibility of Strong Watermarking

Fabrice Y Harel-Canada, Boran Erol, Connor Choi et al.

Watermarking AI-generated text is critical for combating misuse. Yet recent theoretical work argues that any watermark can be erased via random walk attacks that perturb text while preserving quality. However, such attacks rely on two key assumptions: (1) rapid mixing (watermarks dissolve quickly under perturbations) and (2) reliable quality preservation (automated quality oracles perfectly guide edits). Through large-scale experiments and human-validated assessments, we find mixing is slow: 100% of perturbed texts retain traces of their origin after hundreds of edits, defying rapid mixing. Oracles falter, as state-of-the-art quality detectors misjudge edits (77% accuracy), compounding errors during attacks. Ultimately, attacks underperform: automated walks remove watermarks just 26% of the time -- dropping to 10% under human quality review. These findings challenge the inevitability of watermark removal. Instead, practical barriers -- slow mixing and imperfect quality control -- reveal watermarking to be far more robust than theoretical models suggest. The gap between idealized attacks and real-world feasibility underscores the need for stronger watermarking methods and more realistic attack models.

CLJun 18, 2024
Measuring Psychological Depth in Language Models

Fabrice Harel-Canada, Hanyu Zhou, Sreya Muppalla et al.

Evaluations of creative stories generated by large language models (LLMs) often focus on objective properties of the text, such as its style, coherence, and diversity. While these metrics are indispensable, they do not speak to a story's subjective, psychological impact from a reader's perspective. We introduce the Psychological Depth Scale (PDS), a novel framework rooted in literary theory that measures an LLM's ability to produce authentic and narratively complex stories that provoke emotion, empathy, and engagement. We empirically validate our framework by showing that humans can consistently evaluate stories based on PDS (0.72 Krippendorff's alpha). We also explore techniques for automating the PDS to easily scale future analyses. GPT-4o, combined with a novel Mixture-of-Personas (MoP) prompting strategy, achieves an average Spearman correlation of 0.51 with human judgment while Llama-3-70B with constrained decoding scores as high as 0.68 for empathy. Finally, we compared the depth of stories authored by both humans and LLMs. Surprisingly, GPT-4 stories either surpassed or were statistically indistinguishable from highly-rated human-written stories sourced from Reddit. By shifting the focus from text to reader, the Psychological Depth Scale is a validated, automated, and systematic means of measuring the capacity of LLMs to connect with humans through the stories they tell.

CRAug 21, 2020
Indistinguishability Obfuscation from Well-Founded Assumptions

Aayush Jain, Huijia Lin, Amit Sahai

In this work, we show how to construct indistinguishability obfuscation from subexponential hardness of four well-founded assumptions. We prove: Let $τ\in (0,\infty), δ\in (0,1), ε\in (0,1)$ be arbitrary constants. Assume sub-exponential security of the following assumptions, where $λ$ is a security parameter, and the parameters $\ell,k,n$ below are large enough polynomials in $λ$: - The SXDH assumption on asymmetric bilinear groups of a prime order $p = O(2^λ)$, - The LWE assumption over $\mathbb{Z}_{p}$ with subexponential modulus-to-noise ratio $2^{k^ε}$, where $k$ is the dimension of the LWE secret, - The LPN assumption over $\mathbb{Z}_p$ with polynomially many LPN samples and error rate $1/\ell^δ$, where $\ell$ is the dimension of the LPN secret, - The existence of a Boolean PRG in $\mathsf{NC}^0$ with stretch $n^{1+τ}$, Then, (subexponentially secure) indistinguishability obfuscation for all polynomial-size circuits exists.

CRSep 28, 2018
Expander Graphs are Non-Malleable Codes

Peter M. R. Rasmussen, Amit Sahai

Any $d$-regular graph on $n$ vertices with spectral expansion $λ$ satisfying $n = Ω(d^3\log(d)/λ)$ yields a $O\left(\frac{λ^{3/2}}{d}\right)$-non-malleable code for single-bit messages in the split-state model.

CROct 19, 2012
Attribute-Based Encryption for Circuits from Multilinear Maps

Amit Sahai, Brent Waters

In this work, we provide the first construction of Attribute-Based Encryption (ABE) for general circuits. Our construction is based on the existence of multilinear maps. We prove selective security of our scheme in the standard model under the natural multilinear generalization of the BDDH assumption. Our scheme achieves both Key-Policy and Ciphertext-Policy variants of ABE.

CROct 13, 2012
On Constant-Round Concurrent Zero-Knowledge from a Knowledge Assumption

Divya Gupta, Amit Sahai

In this work, we consider the long-standing open question of constructing constant-round concurrent zero-knowledge protocols in the plain model. Resolving this question is known to require non-black-box techniques. We consider non-black-box techniques for zero-knowledge based on knowledge assumptions, a line of thinking initiated by the work of Hada and Tanaka (CRYPTO 1998). Prior to our work, it was not known whether knowledge assumptions could be used for achieving security in the concurrent setting, due to a number of significant limitations that we discuss here. Nevertheless, we obtain the following results: 1. We obtain the first constant round concurrent zero-knowledge argument for \textbf{NP} in the plain model based on a new variant of knowledge of exponent assumption. Furthermore, our construction avoids the inefficiency inherent in previous non-black-box techniques such that those of Barak (FOCS 2001); we obtain our result through an efficient protocol compiler. 2. Unlike Hada and Tanaka, we do not require a knowledge assumption to argue the soundness of our protocol. Instead, we use a discrete log like assumption, which we call Diffie-Hellman Logarithm Assumption, to prove the soundness of our protocol. 3. We give evidence that our new variant of knowledge of exponent assumption is in fact plausible. In particular, we show that our assumption holds in the generic group model. 4. Knowledge assumptions are especially delicate assumptions whose plausibility may be hard to gauge. We give a novel framework to express knowledge assumptions in a more flexible way, which may allow for formulation of plausible assumptions and exploration of their impact and application in cryptography.