Simplex Random Features
This work provides a more accurate and efficient kernel approximation method for machine learning applications, such as scalable Transformers, though it is incremental within the random features domain.
The authors tackled the problem of approximating softmax and Gaussian kernels with random features, introducing Simplex Random Features (SimRFs) that achieve the smallest possible mean square error among unbiased estimates in a specific class, outperforming prior methods like Orthogonal Random Features without extra cost.
We present Simplex Random Features (SimRFs), a new random feature (RF) mechanism for unbiased approximation of the softmax and Gaussian kernels by geometrical correlation of random projection vectors. We prove that SimRFs provide the smallest possible mean square error (MSE) on unbiased estimates of these kernels among the class of weight-independent geometrically-coupled positive random feature (PRF) mechanisms, substantially outperforming the previously most accurate Orthogonal Random Features at no observable extra cost. We present a more computationally expensive SimRFs+ variant, which we prove is asymptotically optimal in the broader family of weight-dependent geometrical coupling schemes (which permit correlations between random vector directions and norms). In extensive empirical studies, we show consistent gains provided by SimRFs in settings including pointwise kernel estimation, nonparametric classification and scalable Transformers.