LG DS FADec 14, 2015

Near-Optimal Bounds for Binary Embeddings of Arbitrary Sets

arXiv:1512.04433v141 citations

Originality Incremental advance

AI Analysis

This work addresses fundamental challenges in binary embeddings for machine learning and data compression, offering theoretical guarantees that are incremental improvements over prior bounds.

The paper tackles the problem of embedding arbitrary sets from the unit sphere into a Hamming cube with minimal distortion and sample complexity, deriving near-optimal bounds that depend on the Gaussian width of the set, such as m ≈ δ⁻²d for subspaces and m ≈ δ⁻⁴ω²(K) for general sets.

We study embedding a subset $K$ of the unit sphere to the Hamming cube $\{-1,+1\}^m$. We characterize the tradeoff between distortion and sample complexity $m$ in terms of the Gaussian width $ω(K)$ of the set. For subspaces and several structured sets we show that Gaussian maps provide the optimal tradeoff $m\sim δ^{-2}ω^2(K)$, in particular for $δ$ distortion one needs $m\approxδ^{-2}{d}$ where $d$ is the subspace dimension. For general sets, we provide sharp characterizations which reduces to $m\approx{δ^{-4}}{ω^2(K)}$ after simplification. We provide improved results for local embedding of points that are in close proximity of each other which is related to locality sensitive hashing. We also discuss faster binary embedding where one takes advantage of an initial sketching procedure based on Fast Johnson-Lindenstauss Transform. Finally, we list several numerical observations and discuss open problems.

View on arXiv PDF

Similar