ITMay 22
The Rate-Immediacy Barrier in Explicit Tree Code ConstructionsGil Cohen, Leonard J. Schulman, Piyush Srivastava
Since the introduction of tree codes by Schulman (STOC 1993), explicit construction of asymptotically good tree codes has remained a notorious challenge. A work by Cohen, Haeupler and Schulman (STOC 2018), as well as the state-of-the-art construction by Ben Yaacov, Cohen, and Yankovitz (STOC 2022) have achieved codes with rate $Ω(1/\log\log n)$, exponentially improving upon the original rate $Ω(1/\log n)$ construction of Evans, Klugerman and Schulman from 1994. All of these constructions rely, at least in part, on increasingly sophisticated methods of combining (block) error-correcting codes. In this work, we identify a fundamental barrier to constructing tree codes using known techniques. We introduce a key property which we call immediacy, that, while not required by the original definition of tree codes, is shared by all known constructions and inherently arises in recursive combinations of error-correcting codes. Our main technical contribution is the proof of a rate-immediacy trade-off, which, in particular, implies that any tree code with constant distance and non-trivial immediacy must necessarily have vanishing rate. By applying our rate-immediacy trade-off to existing constructions, we establish that their known rate analyses are essentially optimal given their actual error-correction properties. More broadly, our work highlights the need for fundamentally new ideas -- beyond the recursive use of error-correcting codes -- to achieve substantial progress in explicitly constructing asymptotically good tree codes.
LGMay 26
Innovation: An Almost Characterization of HallucinationNishant P. Das, Piyush Srivastava
Hallucination is a central limitation of large language models (LLMs), and substantial effort has been devoted to understanding and mitigating it. Towards this, Kalai and Vempala (STOC 2024) introduced a probabilistic framework formalizing calibration and hallucination, and showed that, with high probability, calibrated LLMs hallucinate roughly at the rate of the "missing mass", a measure of how incomplete the training data is relative to its source. This raises two fundamental questions: (i) what property of a calibrated LLM makes hallucinations unavoidable? and (ii) can hallucinations be avoided by giving up calibration? We answer these questions by introducing a simpler property we call innovation that measures the tendency of a model to produce outputs outside the training data. We show that innovation is implied by the condition for hallucination identified by Kalai and Vempala, and, further, that it is an almost characterization of hallucination: hallucination implies innovation, and conversely, innovation implies hallucination with high probability. We also provide lower bounds on the hallucination rate based on the "innovation rate", and by relating innovation rate back to missing mass, we obtain new hallucination rate lower bounds based on missing mass that extend the results of Kalai and Vempala.
DSMar 14, 2025
Approximating the Total Variation Distance between GaussiansArnab Bhattacharyya, Weiming Feng, Piyush Srivastava
The total variation distance is a metric of central importance in statistics and probability theory. However, somewhat surprisingly, questions about computing it algorithmically appear not to have been systematically studied until very recently. In this paper, we contribute to this line of work by studying this question in the important special case of multivariate Gaussians. More formally, we consider the problem of approximating the total variation distance between two multivariate Gaussians to within an $ε$-relative error. Previous works achieved a fixed constant relative error approximation via closed-form formulas. In this work, we give algorithms that given any two $n$-dimensional Gaussians $D_1,D_2$, and any error bound $ε> 0$, approximate the total variation distance $D := d_{TV}(D_1,D_2)$ to $ε$-relative accuracy in $\text{poly}(n,\frac{1}ε,\log \frac{1}{D})$ operations. The main technical tool in our work is a reduction that helps us extend the recent progress on computing the TV-distance between discrete random variables to our continuous setting.
COMar 12
Deterministically approximating the volume of a Kostka polytopeHariharan Narayanan, Piyush Srivastava
Polynomial-time deterministic approximation of volumes of polytopes, up to an approximation factor that grows at most sub-exponentially with the dimension, remains an open problem. Recent work on this question has focused on identifying interesting classes of polytopes for which such approximation algorithms can be obtained. In this paper, we focus on one such class of polytopes: the Kostka polytopes. The volumes of Kostka polytopes appear naturally in questions of random matrix theory, in the context of evaluating the probability density that a random Hermitian matrix with fixed spectrum $λ$ has a given diagonal $μ$ (the so-called randomized Schur-Horn problem): the corresponding Kostka polytope is denoted $\mathrm{GT}(λ, μ)$. We give a polynomial-time deterministic algorithm for approximating the volume of a ($Ω(n^2)$ dimensional) Kostka polytope $\mathrm{GT}(λ, μ)$ to within a multiplicative factor of $\exp(O(n\log n))$, when $λ$ is an integral partition with $n$ parts, with entries bounded above by a polynomial in $n$, and $μ$ is an integer vector lying in the interior of the permutohedron (i.e., convex hull of all permutations) of $λ$. The algorithm thus gives asymptotically correct estimates of the log-volume of Kostka polytopes corresponding to such $(λ, μ)$. Our approach is based on a partition function interpretation of a continuous analogue of Schur polynomials.
LGNov 9, 2021
Universal Lower Bound for Learning Causal DAGs with Atomic InterventionsVibhor Porwal, Piyush Srivastava, Gaurav Sinha
A well-studied challenge that arises in the structure learning problem of causal directed acyclic graphs (DAG) is that using observational data, one can only learn the graph up to a "Markov equivalence class" (MEC). The remaining undirected edges have to be oriented using interventions, which can be very expensive to perform in applications. Thus, the problem of minimizing the number of interventions needed to fully orient the MEC has received a lot of recent attention, and is also the focus of this work. Our first result is a new universal lower bound on the number of single-node interventions that any algorithm (whether active or passive) would need to perform in order to orient a given MEC. Our second result shows that this bound is, in fact, within a factor of two of the size of the smallest set of single-node interventions that can orient the MEC. Our lower bound is provably better than previously known lower bounds. Further, using simulations on synthetic graphs and by giving examples of special graph families, we show that our bound is often significantly better. To prove our lower bound, we develop the notion of clique-block shared-parents (CBSP) orderings, which are topological orderings of DAGs without v-structures and satisfy certain special properties. We also use the techniques developed here to extend our results to the setting of multi-node interventions.