Thibaut Horel

CR
7papers
178citations
Novelty61%
AI Score29

7 Papers

LGOct 4, 2023
Estimation of Models with Limited Data by Leveraging Shared Structure

Maryann Rui, Thibaut Horel, Munther Dahleh · mit

Modern data sets, such as those in healthcare and e-commerce, are often derived from many individuals or systems but have insufficient data from each source alone to separately estimate individual, often high-dimensional, model parameters. If there is shared structure among systems however, it may be possible to leverage data from other systems to help estimate individual parameters, which could otherwise be non-identifiable. In this paper, we assume systems share a latent low-dimensional parameter space and propose a method for recovering $d$-dimensional parameters for $N$ different linear systems, even when there are only $T<d$ observations per system. To do so, we develop a three-step algorithm which estimates the low-dimensional subspace spanned by the systems' parameters and produces refined parameter estimates within the subspace. We provide finite sample subspace estimation error guarantees for our proposed method. Finally, we experimentally validate our method on simulations with i.i.d. regression data and as well as correlated time series data.

STJun 10, 2020
Optimal Bounds between $f$-Divergences and Integral Probability Metrics

Rohit Agrawal, Thibaut Horel

The families of $f$-divergences (e.g. the Kullback-Leibler divergence) and Integral Probability Metrics (e.g. total variation distance or maximum mean discrepancies) are widely used to quantify the similarity between probability distributions. In this work, we systematically study the relationship between these two families from the perspective of convex duality. Starting from a tight variational representation of the $f$-divergence, we derive a generalization of the moment-generating function, which we show exactly characterizes the best lower bound of the $f$-divergence as a function of a given IPM. Using this characterization, we obtain new bounds while also recovering in a unified manner well-known results, such as Hoeffding's lemma, Pinsker's inequality and its extension to subgaussian functions, and the Hammersley-Chapman-Robbins bound. This characterization also allows us to prove new results on topological properties of the divergence which may be of independent interest.

DSJul 19, 2019
Data Structures Meet Cryptography: 3SUM with Preprocessing

Alexander Golovnev, Siyao Guo, Thibaut Horel et al.

This paper shows several connections between data structure problems and cryptography against preprocessing attacks. Our results span data structure upper bounds, cryptographic applications, and data structure lower bounds, as summarized next. First, we apply Fiat--Naor inversion, a technique with cryptographic origins, to obtain a data structure upper bound. In particular, our technique yields a suite of algorithms with space $S$ and (online) time $T$ for a preprocessing version of the $N$-input 3SUM problem where $S^3\cdot T = \widetilde{O}(N^6)$. This disproves a strong conjecture (Goldstein et al., WADS 2017) that there is no data structure that solves this problem for $S=N^{2-δ}$ and $T = N^{1-δ}$ for any constant $δ>0$. Secondly, we show equivalence between lower bounds for a broad class of (static) data structure problems and one-way functions in the random oracle model that resist a very strong form of preprocessing attack. Concretely, given a random function $F: [N] \to [N]$ (accessed as an oracle) we show how to compile it into a function $G^F: [N^2] \to [N^2]$ which resists $S$-bit preprocessing attacks that run in query time $T$ where $ST=O(N^{2-\varepsilon})$ (assuming a corresponding data structure lower bound on 3SUM). In contrast, a classical result of Hellman tells us that $F$ itself can be more easily inverted, say with $N^{2/3}$-bit preprocessing in $N^{2/3}$ time. We also show that much stronger lower bounds follow from the hardness of kSUM. Our results can be equivalently interpreted as security against adversaries that are very non-uniform, or have large auxiliary input, or as security in the face of a powerfully backdoored random oracle. Thirdly, we give non-adaptive lower bounds for 3SUM and a range of geometric problems which match the best known lower bounds for static data structure problems.

CRFeb 28, 2019
Unifying computational entropies via Kullback-Leibler divergence

Rohit Agrawal, Yi-Hsiu Chen, Thibaut Horel et al.

We introduce hardness in relative entropy, a new notion of hardness for search problems which on the one hand is satisfied by all one-way functions and on the other hand implies both next-block pseudoentropy and inaccessible entropy, two forms of computational entropy used in recent constructions of pseudorandom generators and statistically hiding commitment schemes, respectively. Thus, hardness in relative entropy unifies the latter two notions of computational entropy and sheds light on the apparent "duality" between them. Additionally, it yields a more modular and illuminating proof that one-way functions imply next-block inaccessible entropy, similar in structure to the proof that one-way functions imply next-block pseudoentropy (Vadhan and Zheng, STOC '12).

CRFeb 21, 2018
How to Subvert Backdoored Encryption: Security Against Adversaries that Decrypt All Ciphertexts

Thibaut Horel, Sunoo Park, Silas Richelson et al.

We study secure and undetectable communication in a world where governments can read all encrypted communications of citizens. We consider a world where the only permitted communication method is via a government-mandated encryption scheme, using government-mandated keys. Citizens caught trying to communicate otherwise (e.g., by encrypting strings which do not appear to be natural language plaintexts) will be arrested. The one guarantee we suppose is that the government-mandated encryption scheme is semantically secure against outsiders: a perhaps advantageous feature to secure communication against foreign entities. But what good is semantic security against an adversary that has the power to decrypt? Even in this pessimistic scenario, we show citizens can communicate securely and undetectably. Informally, there is a protocol between Alice and Bob where they exchange ciphertexts that look innocuous even to someone who knows the secret keys and thus sees the corresponding plaintexts. And yet, in the end, Alice will have transmitted her secret message to Bob. Our security definition requires indistinguishability between unmodified use of the mandated encryption scheme, and conversations using the mandated encryption scheme in a modified way for subliminal communication. Our topics may be thought to fall broadly within the realm of steganography: the science of hiding secret communication in innocent-looking messages, or cover objects. However, we deal with the non-standard setting of adversarial cover object distributions (i.e., a stronger-than-usual adversary). We leverage that our cover objects are ciphertexts of a secure encryption scheme to bypass impossibility results which we show for broader classes of steganographic schemes. We give several constructions of subliminal communication schemes based on any key exchange protocol with random messages (e.g., Diffie-Hellman).

STOct 4, 2015
The Proximal Robbins-Monro Method

Panos Toulis, Thibaut Horel, Edoardo M. Airoldi

The need for parameter estimation with massive datasets has reinvigorated interest in stochastic optimization and iterative estimation procedures. Stochastic approximations are at the forefront of this recent development as they yield procedures that are simple, general, and fast. However, standard stochastic approximations are often numerically unstable. Deterministic optimization, on the other hand, increasingly uses proximal updates to achieve numerical stability in a principled manner. A theoretical gap has thus emerged. While standard stochastic approximations are subsumed by the framework of Robbins and Monro (1951), there is no such framework for stochastic approximations with proximal updates. In this paper, we conceptualize a proximal version of the classical Robbins-Monro procedure. Our theoretical analysis demonstrates that the proposed procedure has important stability benefits over the classical Robbins-Monro procedure, while it retains the best known convergence rates. Exact implementations of the proximal Robbins-Monro procedure are challenging, but we show that approximate implementations lead to procedures that are easy to implement, and still dominate classical procedures by achieving numerical stability, practically without tradeoffs. Moreover, approximate proximal Robbins-Monro procedures can be applied even when the objective cannot be calculated analytically, and so they generalize stochastic proximal procedures currently in use.

SIMay 21, 2015
Inferring Graphs from Cascades: A Sparse Recovery Framework

Jean Pouget-Abadie, Thibaut Horel

In the Network Inference problem, one seeks to recover the edges of an unknown graph from the observations of cascades propagating over this graph. In this paper, we approach this problem from the sparse recovery perspective. We introduce a general model of cascades, including the voter model and the independent cascade model, for which we provide the first algorithm which recovers the graph's edges with high probability and $O(s\log m)$ measurements where $s$ is the maximum degree of the graph and $m$ is the number of nodes. Furthermore, we show that our algorithm also recovers the edge weights (the parameters of the diffusion process) and is robust in the context of approximate sparsity. Finally we prove an almost matching lower bound of $Ω(s\log\frac{m}{s})$ and validate our approach empirically on synthetic graphs.