CLFeb 3
Accelerating Scientific Research with Gemini: Case Studies and Common TechniquesDavid P. Woodruff, Vincent Cohen-Addad, Lalit Jain et al.
Recent advances in large language models (LLMs) have opened new avenues for accelerating scientific research. While models are increasingly capable of assisting with routine tasks, their ability to contribute to novel, expert-level mathematical discovery is less understood. We present a collection of case studies demonstrating how researchers have successfully collaborated with advanced AI models, specifically Google's Gemini-based models (in particular Gemini Deep Think and its advanced variants), to solve open problems, refute conjectures, and generate new proofs across diverse areas in theoretical computer science, as well as other areas such as economics, optimization, and physics. Based on these experiences, we extract common techniques for effective human-AI collaboration in theoretical research, such as iterative refinement, problem decomposition, and cross-disciplinary knowledge transfer. While the majority of our results stem from this interactive, conversational methodology, we also highlight specific instances that push beyond standard chat interfaces. These include deploying the model as a rigorous adversarial reviewer to detect subtle flaws in existing proofs, and embedding it within a "neuro-symbolic" loop that autonomously writes and executes code to verify complex derivations. Together, these examples highlight the potential of AI not just as a tool for automation, but as a versatile, genuine partner in the creative process of scientific discovery.
CRMay 3, 2021
Distributional Collision Resistance Beyond One-Way FunctionsNir Bitansky, Iftach Haitner, Ilan Komargodski et al.
Distributional collision resistance is a relaxation of collision resistance that only requires that it is hard to sample a collision $(x,y)$ where $x$ is uniformly random and $y$ is uniformly random conditioned on colliding with $x$. The notion lies between one-wayness and collision resistance, but its exact power is still not well-understood. On one hand, distributional collision resistant hash functions cannot be built from one-way functions in a black-box way, which may suggest that they are stronger. On the other hand, so far, they have not yielded any applications beyond one-way functions. Assuming distributional collision resistant hash functions, we construct \emph{constant-round} statistically hiding commitment scheme. Such commitments are not known based on one-way functions and are impossible to obtain from one-way functions in a black-box way. Our construction relies on the reduction from inaccessible entropy generators to statistically hiding commitments by Haitner et al.\ (STOC '09). In the converse direction, we show that two-message statistically hiding commitments imply distributional collision resistance, thereby establishing a loose equivalence between the two notions. A corollary of the first result is that constant-round statistically hiding commitments are implied by average-case hardness in the class $SZK$ (which is known to imply distributional collision resistance). This implication seems to be folklore, but to the best of our knowledge has not been proven explicitly. We provide yet another proof of this implication, which is arguably more direct than the one going through distributional collision resistance.
LGJan 22, 2021
Adversarial Laws of Large Numbers and Optimal Regret in Online ClassificationNoga Alon, Omri Ben-Eliezer, Yuval Dagan et al.
Laws of large numbers guarantee that given a large enough sample from some population, the measure of any fixed sub-population is well-estimated by its frequency in the sample. We study laws of large numbers in sampling processes that can affect the environment they are acting upon and interact with it. Specifically, we consider the sequential sampling model proposed by Ben-Eliezer and Yogev (2020), and characterize the classes which admit a uniform law of large numbers in this model: these are exactly the classes that are \emph{online learnable}. Our characterization may be interpreted as an online analogue to the equivalence between learnability and uniform convergence in statistical (PAC) learning. The sample-complexity bounds we obtain are tight for many parameter regimes, and as an application, we determine the optimal regret bounds in online learning, stated in terms of \emph{Littlestone's dimension}, thus resolving the main open question from Ben-David, Pál, and Shalev-Shwartz (2009), which was also posed by Rakhlin, Sridharan, and Tewari (2015).
DSMar 31, 2020
A Framework for Adversarially Robust Streaming AlgorithmsOmri Ben-Eliezer, Rajesh Jayaram, David P. Woodruff et al.
We investigate the adversarial robustness of streaming algorithms. In this context, an algorithm is considered robust if its performance guarantees hold even if the stream is chosen adaptively by an adversary that observes the outputs of the algorithm along the stream and can react in an online manner. While deterministic streaming algorithms are inherently robust, many central problems in the streaming literature do not admit sublinear-space deterministic algorithms; on the other hand, classical space-efficient randomized algorithms for these problems are generally not adversarially robust. This raises the natural question of whether there exist efficient adversarially robust (randomized) streaming algorithms for these problems. In this work, we show that the answer is positive for various important streaming problems in the insertion-only model, including distinct elements and more generally $F_p$-estimation, $F_p$-heavy hitters, entropy estimation, and others. For all of these problems, we develop adversarially robust $(1+\varepsilon)$-approximation algorithms whose required space matches that of the best known non-robust algorithms up to a $\text{poly}(\log n, 1/\varepsilon)$ multiplicative factor (and in some cases even up to a constant factor). Towards this end, we develop several generic tools allowing one to efficiently transform a non-robust streaming algorithm into a robust one in various scenarios.
DSJun 26, 2019
The Adversarial Robustness of SamplingOmri Ben-Eliezer, Eylon Yogev
Random sampling is a fundamental primitive in modern algorithms, statistics, and machine learning, used as a generic method to obtain a small yet "representative" subset of the data. In this work, we investigate the robustness of sampling against adaptive adversarial attacks in a streaming setting: An adversary sends a stream of elements from a universe $U$ to a sampling algorithm (e.g., Bernoulli sampling or reservoir sampling), with the goal of making the sample "very unrepresentative" of the underlying data stream. The adversary is fully adaptive in the sense that it knows the exact content of the sample at any given point along the stream, and can choose which element to send next accordingly, in an online manner. Well-known results in the static setting indicate that if the full stream is chosen in advance (non-adaptively), then a random sample of size $Ω(d / \varepsilon^2)$ is an $\varepsilon$-approximation of the full data with good probability, where $d$ is the VC-dimension of the underlying set system $(U,R)$. Does this sample size suffice for robustness against an adaptive adversary? The simplistic answer is \emph{negative}: We demonstrate a set system where a constant sample size (corresponding to VC-dimension $1$) suffices in the static setting, yet an adaptive adversary can make the sample very unrepresentative, as long as the sample size is (strongly) sublinear in the stream length, using a simple and easy-to-implement attack. However, this attack is "theoretical only", requiring the set system size to (essentially) be exponential in the stream length. This is not a coincidence: We show that to make Bernoulli or reservoir sampling robust against adaptive adversaries, the modification required is solely to replace the VC-dimension term $d$ in the sample size with the cardinality term $\log |R|$. This nearly matches the bound imposed by the attack.
DSDec 4, 2017
Distributed Algorithms Made Secure: A Graph Theoretic ApproachMerav Parter, Eylon Yogev
In the area of distributed graph algorithms a number of network's entities with local views solve some computational task by exchanging messages with their neighbors. Quite unfortunately, an inherent property of most existing distributed algorithms is that throughout the course of their execution, the nodes get to learn not only their own output but rather learn quite a lot on the inputs or outputs of many other entities. This leakage of information might be a major obstacle in settings where the output (or input) of network's individual is a private information. In this paper, we introduce a new framework for \emph{secure distributed graph algorithms} and provide the first \emph{general compiler} that takes any "natural" non-secure distributed algorithm that runs in $r$ rounds, and turns it into a secure algorithm that runs in $\widetilde{O}(r \cdot D \cdot poly(Δ))$ rounds where $Δ$ is the maximum degree in the graph and $D$ is its diameter. The security of the compiled algorithm is information-theoretic but holds only against a semi-honest adversary that controls a single node in the network. This compiler is made possible due to a new combinatorial structure called \emph{private neighborhood trees}: a collection of $n$ trees $T(u_1),\ldots,T(u_n)$, one for each vertex $u_i \in V(G)$, such that each tree $T(u_i)$ spans the neighbors of $u_i$ {\em without going through $u_i$}. Intuitively, each tree $T(u_i)$ allows all neighbors of $u_i$ to exchange a \emph{secret} that is hidden from $u_i$, which is the basic graph infrastructure of the compiler. In a $(d,c)$-private neighborhood trees each tree $T(u_i)$ has depth at most $d$ and each edge $e \in G$ appears in at most $c$ different trees. We show a construction of private neighborhood trees with $d=\widetilde{O}(Δ\cdot D)$ and $c=\widetilde{O}(D)$.
IRNov 1, 2016
CB2CF: A Neural Multiview Content-to-Collaborative Filtering Model for Completely Cold Item RecommendationsOren Barkan, Noam Koenigstein, Eylon Yogev et al.
In Recommender Systems research, algorithms are often characterized as either Collaborative Filtering (CF) or Content Based (CB). CF algorithms are trained using a dataset of user preferences while CB algorithms are typically based on item profiles. These approaches harness different data sources and therefore the resulting recommended items are generally very different. This paper presents the CB2CF, a deep neural multiview model that serves as a bridge from items content into their CF representations. CB2CF is a real-world algorithm designed for Microsoft Store services that handle around a billion users worldwide. CB2CF is demonstrated on movies and apps recommendations, where it is shown to outperform an alternative CB model on completely cold items.
CRDec 29, 2014
Bloom Filters in Adversarial EnvironmentsMoni Naor, Eylon Yogev
Many efficient data structures use randomness, allowing them to improve upon deterministic ones. Usually, their efficiency and correctness are analyzed using probabilistic tools under the assumption that the inputs and queries are independent of the internal randomness of the data structure. In this work, we consider data structures in a more robust model, which we call the adversarial model. Roughly speaking, this model allows an adversary to choose inputs and queries adaptively according to previous responses. Specifically, we consider a data structure known as "Bloom filter" and prove a tight connection between Bloom filters in this model and cryptography. A Bloom filter represents a set $S$ of elements approximately, by using fewer bits than a precise representation. The price for succinctness is allowing some errors: for any $x \in S$ it should always answer `Yes', and for any $x \notin S$ it should answer `Yes' only with small probability. In the adversarial model, we consider both efficient adversaries (that run in polynomial time) and computationally unbounded adversaries that are only bounded in the number of queries they can make. For computationally bounded adversaries, we show that non-trivial (memory-wise) Bloom filters exist if and only if one-way functions exist. For unbounded adversaries we show that there exists a Bloom filter for sets of size $n$ and error $\varepsilon$, that is secure against $t$ queries and uses only $O(n \log{\frac{1}{\varepsilon}}+t)$ bits of memory. In comparison, $n\log{\frac{1}{\varepsilon}}$ is the best possible under a non-adaptive adversary.
CRMar 22, 2014
Secret-Sharing for NPIlan Komargodski, Moni Naor, Eylon Yogev
A computational secret-sharing scheme is a method that enables a dealer, that has a secret, to distribute this secret among a set of parties such that a "qualified" subset of parties can efficiently reconstruct the secret while any "unqualified" subset of parties cannot efficiently learn anything about the secret. The collection of "qualified" subsets is defined by a Boolean function. It has been a major open problem to understand which (monotone) functions can be realized by a computational secret-sharing schemes. Yao suggested a method for secret-sharing for any function that has a polynomial-size monotone circuit (a class which is strictly smaller than the class of monotone functions in P). Around 1990 Rudich raised the possibility of obtaining secret-sharing for all monotone functions in NP: In order to reconstruct the secret a set of parties must be "qualified" and provide a witness attesting to this fact. Recently, Garg et al. (STOC 2013) put forward the concept of witness encryption, where the goal is to encrypt a message relative to a statement "x in L" for a language L in NP such that anyone holding a witness to the statement can decrypt the message, however, if x is not in L, then it is computationally hard to decrypt. Garg et al. showed how to construct several cryptographic primitives from witness encryption and gave a candidate construction. One can show that computational secret-sharing implies witness encryption for the same language. Our main result is the converse: we give a construction of a computational secret-sharing scheme for any monotone function in NP assuming witness encryption for NP and one-way functions. As a consequence we get a completeness theorem for secret-sharing: computational secret-sharing scheme for any single monotone NP-complete function implies a computational secret-sharing scheme for every monotone function in NP.