Kernel computations from large-scale random features obtained by Optical Processing Units
This work addresses the problem of scaling kernel methods for large datasets, benefiting researchers and practitioners in machine learning by offering a hardware-accelerated solution, though it is incremental as it builds on existing random features and optical hardware concepts.
The paper tackles the computational challenge of large-scale random features for kernel approximations by leveraging Optical Processing Units (OPUs) to perform fast and energy-efficient analog computations, achieving competitive performance in kernel ridge regression and image classification tasks with significant time and energy savings.
Approximating kernel functions with random features (RFs)has been a successful application of random projections for nonparametric estimation. However, performing random projections presents computational challenges for large-scale problems. Recently, a new optical hardware called Optical Processing Unit (OPU) has been developed for fast and energy-efficient computation of large-scale RFs in the analog domain. More specifically, the OPU performs the multiplication of input vectors by a large random matrix with complex-valued i.i.d. Gaussian entries, followed by the application of an element-wise squared absolute value operation - this last nonlinearity being intrinsic to the sensing process. In this paper, we show that this operation results in a dot-product kernel that has connections to the polynomial kernel, and we extend this computation to arbitrary powers of the feature map. Experiments demonstrate that the OPU kernel and its RF approximation achieve competitive performance in applications using kernel ridge regression and transfer learning for image classification. Crucially, thanks to the use of the OPU, these results are obtained with time and energy savings.