Ruben Ohana

LG
h-index46
24papers
440citations
Novelty57%
AI Score57

24 Papers

LGOct 4, 2023Code
Multiple Physics Pretraining for Physical Surrogate Models

Michael McCabe, Bruno Régaldo-Saint Blancard, Liam Holden Parker et al. · cambridge

We introduce multiple physics pretraining (MPP), an autoregressive task-agnostic pretraining approach for physical surrogate modeling of spatiotemporal systems with transformers. In MPP, rather than training one model on a specific physical system, we train a backbone model to predict the dynamics of multiple heterogeneous physical systems simultaneously in order to learn features that are broadly useful across systems and facilitate transfer. In order to learn effectively in this setting, we introduce a shared embedding and normalization strategy that projects the fields of multiple systems into a shared embedding space. We validate the efficacy of our approach on both pretraining and downstream tasks over a broad fluid mechanics-oriented benchmark. We show that a single MPP-pretrained transformer is able to match or outperform task-specific baselines on all pretraining sub-tasks without the need for finetuning. For downstream tasks, we demonstrate that finetuning MPP-trained models results in more accurate predictions across multiple time-steps on systems with previously unseen physical components or higher dimensional systems compared to training from scratch or finetuning pretrained video foundation models. We open-source our code and model weights trained at multiple scales for reproducibility.

MLOct 4, 2023
xVal: A Continuous Numerical Tokenization for Scientific Language Models

Siavash Golkar, Mariel Pettee, Michael Eickenberg et al. · cambridge

Due in part to their discontinuous and discrete default encodings for numbers, Large Language Models (LLMs) have not yet been commonly used to process numerically-dense scientific datasets. Rendering datasets as text, however, could help aggregate diverse and multi-modal scientific data into a single training corpus, thereby potentially facilitating the development of foundation models for science. In this work, we introduce xVal, a strategy for continuously tokenizing numbers within language models that results in a more appropriate inductive bias for scientific applications. By training specially-modified language models from scratch on a variety of scientific datasets formatted as text, we find that xVal generally outperforms other common numerical tokenization strategies on metrics including out-of-distribution generalization and computational efficiency.

MLJun 7, 2022
Shedding a PAC-Bayesian Light on Adaptive Sliced-Wasserstein Distances

Ruben Ohana, Kimia Nadjahi, Alain Rakotomamonjy et al.

The Sliced-Wasserstein distance (SW) is a computationally efficient and theoretically grounded alternative to the Wasserstein distance. Yet, the literature on its statistical properties -- or, more accurately, its generalization properties -- with respect to the distribution of slices, beyond the uniform measure, is scarce. To bring new contributions to this line of research, we leverage the PAC-Bayesian theory and a central observation that SW may be interpreted as an average risk, the quantity PAC-Bayesian bounds have been designed to characterize. We provide three types of results: i) PAC-Bayesian generalization bounds that hold on what we refer as adaptive Sliced-Wasserstein distances, i.e. SW defined with respect to arbitrary distributions of slices (among which data-dependent distributions), ii) a principled procedure to learn the distribution of slices that yields maximally discriminative SW, by optimizing our theoretical bounds, and iii) empirical illustrations of our theoretical findings.

COOct 25, 2023
Removing Dust from CMB Observations with Diffusion Models

David Heurtel-Depeiges, Blakesley Burkhart, Ruben Ohana et al.

In cosmology, the quest for primordial $B$-modes in cosmic microwave background (CMB) observations has highlighted the critical need for a refined model of the Galactic dust foreground. We investigate diffusion-based modeling of the dust foreground and its interest for component separation. Under the assumption of a Gaussian CMB with known cosmology (or covariance matrix), we show that diffusion models can be trained on examples of dust emission maps such that their sampling process directly coincides with posterior sampling in the context of component separation. We illustrate this on simulated mixtures of dust emission and CMB. We show that common summary statistics (power spectrum, Minkowski functionals) of the components are well recovered by this process. We also introduce a model conditioned by the CMB cosmology that outperforms models trained using a single cosmology on component separation. Such a model will be used in future work for diffusion-based cosmological inference.

LGNov 30, 2024Code
The Well: a Large-Scale Collection of Diverse Physics Simulations for Machine Learning

Ruben Ohana, Michael McCabe, Lucas Meyer et al. · cambridge

Machine learning based surrogate models offer researchers powerful tools for accelerating simulation-based workflows. However, as standard datasets in this space often cover small classes of physical behavior, it can be difficult to evaluate the efficacy of new approaches. To address this gap, we introduce the Well: a large-scale collection of datasets containing numerical simulations of a wide variety of spatiotemporal physical systems. The Well draws from domain experts and numerical software developers to provide 15TB of data across 16 datasets covering diverse domains such as biological systems, fluid dynamics, acoustic scattering, as well as magneto-hydrodynamic simulations of extra-galactic fluids or supernova explosions. These datasets can be used individually or as part of a broader benchmark suite. To facilitate usage of the Well, we provide a unified PyTorch interface for training and evaluating models. We demonstrate the function of this library by introducing example baselines that highlight the new challenges posed by the complex dynamics of the Well. The code and data is available at https://github.com/PolymathicAI/the_well.

ETSep 1, 2024
Streamlined optical training of large-scale modern deep learning architectures with direct feedback alignment

Ziao Wang, Kilian Müller, Matthew Filipovich et al.

Modern deep learning relies nearly exclusively on dedicated electronic hardware accelerators. Photonic approaches, with low consumption and high operation speed, are increasingly considered for inference but, to date, remain mostly limited to relatively basic tasks. Simultaneously, the problem of training deep and complex neural networks, overwhelmingly performed through backpropagation, remains a significant limitation to the size and, consequently, the performance of current architectures and a major compute and energy bottleneck. Here, we experimentally implement a versatile and scalable training algorithm, called direct feedback alignment, on a hybrid electronic-photonic platform. An optical processing unit performs large-scale random matrix multiplications, which is the central operation of this algorithm, at speeds up to 1500 TeraOPS under 30 Watts of power. We perform optical training of modern deep learning architectures, including Transformers, with more than 1B parameters, and obtain good performances on language, vision, and diffusion-based generative tasks. We study the scaling of the training time, and demonstrate a potential advantage of our hybrid opto-electronic approach for ultra-deep and wide neural networks, thus opening a promising route to sustain the exponential growth of modern artificial intelligence beyond traditional von Neumann approaches.

68.3LGMar 11
On the Value of Tokeniser Pretraining in Physics Foundation Models

Hadi Sotoudeh, Payel Mukhopadhyay, Ruben Ohana et al.

We investigate the impact of tokeniser pretraining on the accuracy and efficiency of physics emulation. Modern high-resolution simulations produce vast volumes of data spanning diverse physical regimes and scales. Training foundation models to learn the dynamics underlying such data enables the modelling of complex multiphysics phenomena, especially in data-limited settings. The emerging class of physics foundation models typically aims to learn two tasks jointly: (i) extracting compact representations of high-resolution spatiotemporal data, and (ii) capturing governing physical dynamics. However, learning both tasks from scratch simultaneously can impede the effectiveness of either process. We show that pretraining the tokeniser with an autoencoding objective prior to training the dynamics model enhances computational efficiency for physics emulation. Notably, the magnitude of this benefit depends on domain alignment: pretraining on the same physical system as the emulation task yields the largest improvements, while pretraining on other systems provides moderate gains. In-domain pretraining reduces VRMSE by 64% after 10,500 training steps compared to training from scratch. To our knowledge, this is the first systematic investigation of tokeniser pretraining for physics foundation models. We further introduce flexible spatiotemporal compression operations that extend causal convolutions to support runtime-adjustable compression ratios, enabling efficient adaptation to diverse downstream tasks. Our findings provide practical guidance for training efficient physics emulators and highlight the importance of strategic pretraining data selection.

CVJan 6, 2021Code
Adversarial Robustness by Design through Analog Computing and Synthetic Gradients

Alessandro Cappelli, Ruben Ohana, Julien Launay et al.

We propose a new defense mechanism against adversarial attacks inspired by an optical co-processor, providing robustness without compromising natural accuracy in both white-box and black-box settings. This hardware co-processor performs a nonlinear fixed random transformation, where the parameters are unknown and impossible to retrieve with sufficient precision for large enough dimensions. In the white-box setting, our defense works by obfuscating the parameters of the random projection. Unlike other defenses relying on obfuscated gradients, we find we are unable to build a reliable backward differentiable approximation for obfuscated parameters. Moreover, while our model reaches a good natural accuracy with a hybrid backpropagation - synthetic gradient method, the same approach is suboptimal if employed to generate adversarial examples. We find the combination of a random projection and binarization in the optical system also improves robustness against various types of black-box attacks. Finally, our hybrid training method builds robust features against transfer attacks. We demonstrate our approach on a VGG-like architecture, placing the defense on top of the convolutional features, on CIFAR-10 and CIFAR-100. Code is available at https://github.com/lightonai/adversarial-robustness-by-design.

LGJul 3, 2025
Lost in Latent Space: An Empirical Study of Latent Diffusion Models for Physics Emulation

François Rozet, Ruben Ohana, Michael McCabe et al.

The steep computational cost of diffusion models at inference hinders their use as fast physics emulators. In the context of image and video generation, this computational drawback has been addressed by generating in the latent space of an autoencoder instead of the pixel space. In this work, we investigate whether a similar strategy can be effectively applied to the emulation of dynamical systems and at what cost. We find that the accuracy of latent-space emulation is surprisingly robust to a wide range of compression rates (up to 1000x). We also show that diffusion-based emulators are consistently more accurate than non-generative counterparts and compensate for uncertainty in their predictions with greater diversity. Finally, we cover practical design choices, spanning from architectures to optimizers, that we found critical to train latent-space emulators.

MLFeb 29, 2024
Listening to the Noise: Blind Denoising with Gibbs Diffusion

David Heurtel-Depeiges, Charles C. Margossian, Ruben Ohana et al.

In recent years, denoising problems have become intertwined with the development of deep generative models. In particular, diffusion models are trained like denoisers, and the distribution they model coincide with denoising priors in the Bayesian picture. However, denoising through diffusion-based posterior sampling requires the noise level and covariance to be known, preventing blind denoising. We overcome this limitation by introducing Gibbs Diffusion (GDiff), a general methodology addressing posterior sampling of both the signal and the noise parameters. Assuming arbitrary parametric Gaussian noise, we develop a Gibbs algorithm that alternates sampling steps from a conditional diffusion model trained to map the signal prior to the family of noise distributions, and a Monte Carlo sampler to infer the noise parameters. Our theoretical analysis highlights potential pitfalls, guides diagnostic usage, and quantifies errors in the Gibbs stationary distribution caused by the diffusion model. We showcase our method for 1) blind denoising of natural images involving colored noises with unknown amplitude and spectral index, and 2) a cosmology problem, namely the analysis of cosmic microwave background data, where Bayesian inference of "noise" parameters means constraining models of the evolution of the Universe.

LGJul 12, 2025
Controllable Patching for Compute-Adaptive Surrogate Modeling of Partial Differential Equations

Payel Mukhopadhyay, Michael McCabe, Ruben Ohana et al. · cambridge

Patch-based transformer surrogates have become increasingly effective for modeling spatiotemporal dynamics, but the fixed patch size is a major limitation for budget-conscience deployment in production. We introduce two lightweight, architecture-agnostic modules-the Convolutional Kernel Modulator (CKM) and Convolutional Stride Modulator (CSM)-that enable dynamic patch size control at inference in patch based models, without retraining or accuracy loss. Combined with a cyclic patch-size rollout, our method mitigates patch artifacts and improves long-term stability for video-like prediction tasks. Applied to a range of challenging 2D and 3D PDE benchmarks, our approach improves rollout fidelity and runtime efficiency. To our knowledge, this is the first framework to enable inference-time patch-size tunability in patch-based PDE surrogates. Its plug-and-play design makes it broadly applicable across architectures-establishing a general foundation for compute-adaptive modeling in PDE surrogate tasks.

LGNov 19, 2025
Walrus: A Cross-Domain Foundation Model for Continuum Dynamics

Michael McCabe, Payel Mukhopadhyay, Tanya Marwah et al. · cambridge

Foundation models have transformed machine learning for language and vision, but achieving comparable impact in physical simulation remains a challenge. Data heterogeneity and unstable long-term dynamics inhibit learning from sufficiently diverse dynamics, while varying resolutions and dimensionalities challenge efficient training on modern hardware. Through empirical and theoretical analysis, we incorporate new approaches to mitigate these obstacles, including a harmonic-analysis-based stabilization method, load-balanced distributed 2D and 3D training strategies, and compute-adaptive tokenization. Using these tools, we develop Walrus, a transformer-based foundation model developed primarily for fluid-like continuum dynamics. Walrus is pretrained on nineteen diverse scenarios spanning astrophysics, geoscience, rheology, plasma physics, acoustics, and classical fluids. Experiments show that Walrus outperforms prior foundation models on both short and long term prediction horizons on downstream tasks and across the breadth of pretraining data, while ablation studies confirm the value of our contributions to forecast stability, training throughput, and transfer performance over conventional approaches. Code and weights are released for community use.

LGNov 24, 2025
Predicting partially observable dynamical systems via diffusion models with a multiscale inference scheme

Rudy Morel, Francesco Pio Ramunno, Jeff Shen et al.

Conditional diffusion models provide a natural framework for probabilistic prediction of dynamical systems and have been successfully applied to fluid dynamics and weather prediction. However, in many settings, the available information at a given time represents only a small fraction of what is needed to predict future states, either due to measurement uncertainty or because only a small fraction of the state can be observed. This is true for example in solar physics, where we can observe the Sun's surface and atmosphere, but its evolution is driven by internal processes for which we lack direct measurements. In this paper, we tackle the probabilistic prediction of partially observable, long-memory dynamical systems, with applications to solar dynamics and the evolution of active regions. We show that standard inference schemes, such as autoregressive rollouts, fail to capture long-range dependencies in the data, largely because they do not integrate past information effectively. To overcome this, we propose a multiscale inference scheme for diffusion models, tailored to physical processes. Our method generates trajectories that are temporally fine-grained near the present and coarser as we move farther away, which enables capturing long-range temporal dependencies without increasing computational cost. When integrated into a diffusion model, we show that our inference scheme significantly reduces the bias of the predicted distributions and improves rollout stability.

IMOct 20, 2025
Universal Spectral Tokenization via Self-Supervised Panchromatic Representation Learning

Jeff Shen, Francois Lanusse, Liam Holden Parker et al. · cambridge

Sequential scientific data span many resolutions and domains, and unifying them into a common representation is a key step toward developing foundation models for the sciences. Astronomical spectra exemplify this challenge: massive surveys have collected millions of spectra across a wide range of wavelengths and resolutions, yet analyses remain fragmented across spectral domains (e.g., optical vs. infrared) and object types (e.g., stars vs. galaxies), limiting the ability to pool information across datasets. We present a deep learning model that jointly learns from heterogeneous spectra in a self-supervised manner. Our universal spectral tokenizer processes spectra from a variety of object types and resolutions directly on their native wavelength grids, producing intrinsically aligned, homogeneous, and physically meaningful representations that can be efficiently adapted to achieve competitive performance across a range of downstream tasks. For the first time, we demonstrate that a single model can unify spectral data across resolutions and domains, suggesting that our model can serve as a powerful building block for foundation models in astronomy -- and potentially extend to other scientific domains with heterogeneous sequential data, such as climate and healthcare.

CVOct 10, 2025
TC-LoRA: Temporally Modulated Conditional LoRA for Adaptive Diffusion Control

Minkyoung Cho, Ruben Ohana, Christian Jacobsen et al.

Current controllable diffusion models typically rely on fixed architectures that modify intermediate activations to inject guidance conditioned on a new modality. This approach uses a static conditioning strategy for a dynamic, multi-stage denoising process, limiting the model's ability to adapt its response as the generation evolves from coarse structure to fine detail. We introduce TC-LoRA (Temporally Modulated Conditional LoRA), a new paradigm that enables dynamic, context-aware control by conditioning the model's weights directly. Our framework uses a hypernetwork to generate LoRA adapters on-the-fly, tailoring weight modifications for the frozen backbone at each diffusion step based on time and the user's condition. This mechanism enables the model to learn and execute an explicit, adaptive strategy for applying conditional guidance throughout the entire generation process. Through experiments on various data domains, we demonstrate that this dynamic, parametric control significantly enhances generative fidelity and adherence to spatial conditions compared to static, activation-based methods. TC-LoRA establishes an alternative approach in which the model's conditioning strategy is modified through a deeper functional adaptation of its weights, allowing control to align with the dynamic demands of the task and generative stage.

IMOct 4, 2023
AstroCLIP: A Cross-Modal Foundation Model for Galaxies

Liam Parker, Francois Lanusse, Siavash Golkar et al.

We present AstroCLIP, a single, versatile model that can embed both galaxy images and spectra into a shared, physically meaningful latent space. These embeddings can then be used - without any model fine-tuning - for a variety of downstream tasks including (1) accurate in-modality and cross-modality semantic similarity search, (2) photometric redshift estimation, (3) galaxy property estimation from both images and spectra, and (4) morphology classification. Our approach to implementing AstroCLIP consists of two parts. First, we embed galaxy images and spectra separately by pretraining separate transformer-based image and spectrum encoders in self-supervised settings. We then align the encoders using a contrastive loss. We apply our method to spectra from the Dark Energy Spectroscopic Instrument and images from its corresponding Legacy Imaging Survey. Overall, we find remarkable performance on all downstream tasks, even relative to supervised baselines. For example, for a task like photometric redshift prediction, we find similar performance to a specifically-trained ResNet18, and for additional tasks like physical property estimation (stellar mass, age, metallicity, and sSFR), we beat this supervised baseline by 19\% in terms of $R^2$. We also compare our results to a state-of-the-art self-supervised single-modal model for galaxy images, and find that our approach outperforms this benchmark by roughly a factor of two on photometric redshift estimation and physical property prediction in terms of $R^2$, while remaining roughly in-line in terms of morphology classification. Ultimately, our approach represents the first cross-modal self-supervised model for galaxies, and the first self-supervised transformer-based architectures for galaxy images and spectra.

LGMay 12, 2023
MoMo: Momentum Models for Adaptive Learning Rates

Fabian Schaipp, Ruben Ohana, Michael Eickenberg et al.

Training a modern machine learning architecture on a new task requires extensive learning-rate tuning, which comes at a high computational cost. Here we develop new Polyak-type adaptive learning rates that can be used on top of any momentum method, and require less tuning to perform well. We first develop MoMo, a Momentum Model based adaptive learning rate for SGD-M (stochastic gradient descent with momentum). MoMo uses momentum estimates of the losses and gradients sampled at each iteration to build a model of the loss function. Our model makes use of any known lower bound of the loss function by using truncation, e.g. most losses are lower-bounded by zero. The model is then approximately minimized at each iteration to compute the next step. We show how MoMo can be used in combination with any momentum-based method, and showcase this by developing MoMo-Adam, which is Adam with our new model-based adaptive learning rate. We show that MoMo attains a $\mathcal{O}(1/\sqrt{K})$ convergence rate for convex problems with interpolation, needing knowledge of no problem-specific quantities other than the optimal value. Additionally, for losses with unknown lower bounds, we develop on-the-fly estimates of a lower bound, that are incorporated in our model. We show that MoMo and MoMo-Adam improve over SGD-M and Adam in terms of robustness to hyperparameter tuning for training image classifiers on MNIST, CIFAR, and Imagenet, for recommender systems on Criteo, for a transformer model on the translation task IWSLT14, and for a diffusion model.

MLFeb 4, 2022
Complex-to-Real Sketches for Tensor Products with Applications to the Polynomial Kernel

Jonas Wacker, Ruben Ohana, Maurizio Filippone

Randomized sketches of a tensor product of $p$ vectors follow a tradeoff between statistical efficiency and computational acceleration. Commonly used approaches avoid computing the high-dimensional tensor product explicitly, resulting in a suboptimal dependence of $\mathcal{O}(3^p)$ in the embedding dimension. We propose a simple Complex-to-Real (CtR) modification of well-known sketches that replaces real random projections by complex ones, incurring a lower $\mathcal{O}(2^p)$ factor in the embedding dimension. The output of our sketches is real-valued, which renders their downstream use straightforward. In particular, we apply our sketches to $p$-fold self-tensored inputs corresponding to the feature maps of the polynomial kernel. We show that our method achieves state-of-the-art performance in terms of accuracy and speed compared to other randomized approximations from the literature.

CVJul 6, 2021
ROPUST: Improving Robustness through Fine-tuning with Photonic Processors and Synthetic Gradients

Alessandro Cappelli, Julien Launay, Laurent Meunier et al.

Robustness to adversarial attacks is typically obtained through expensive adversarial training with Projected Gradient Descent. Here we introduce ROPUST, a remarkably simple and efficient method to leverage robust pre-trained models and further increase their robustness, at no cost in natural accuracy. Our technique relies on the use of an Optical Processing Unit (OPU), a photonic co-processor, and a fine-tuning step performed with Direct Feedback Alignment, a synthetic gradient training scheme. We test our method on nine different models against four attacks in RobustBench, consistently improving over state-of-the-art performance. We perform an ablation study on the single components of our defense, showing that robustness arises from parameter obfuscation and the alternative training method. We also introduce phase retrieval attacks, specifically designed to increase the threat level of attackers against our own defense. We show that even with state-of-the-art phase retrieval techniques, ROPUST remains an effective defense.

LGJun 7, 2021
Photonic Differential Privacy with Direct Feedback Alignment

Ruben Ohana, Hamlet J. Medina Ruiz, Julien Launay et al.

Optical Processing Units (OPUs) -- low-power photonic chips dedicated to large scale random projections -- have been used in previous work to train deep neural networks using Direct Feedback Alignment (DFA), an effective alternative to backpropagation. Here, we demonstrate how to leverage the intrinsic noise of optical random projections to build a differentially private DFA mechanism, making OPUs a solution of choice to provide a private-by-design training. We provide a theoretical analysis of our adaptive privacy mechanism, carefully measuring how the noise of optical random projections propagates in the process and gives rise to provable Differential Privacy. Finally, we conduct experiments demonstrating the ability of our learning procedure to achieve solid end-task performance.

MLApr 29, 2021
Photonic co-processors in HPC: using LightOn OPUs for Randomized Numerical Linear Algebra

Daniel Hesslow, Alessandro Cappelli, Igor Carron et al.

Randomized Numerical Linear Algebra (RandNLA) is a powerful class of methods, widely used in High Performance Computing (HPC). RandNLA provides approximate solutions to linear algebra functions applied to large signals, at reduced computational costs. However, the randomization step for dimensionality reduction may itself become the computational bottleneck on traditional hardware. Leveraging near constant-time linear random projections delivered by LightOn Optical Processing Units we show that randomization can be significantly accelerated, at negligible precision loss, in a wide range of important RandNLA algorithms, such as RandSVD or trace estimators.

MLNov 24, 2020
Align, then memorise: the dynamics of learning with feedback alignment

Maria Refinetti, Stéphane d'Ascoli, Ruben Ohana et al.

Direct Feedback Alignment (DFA) is emerging as an efficient and biologically plausible alternative to the ubiquitous backpropagation algorithm for training deep neural networks. Despite relying on random feedback weights for the backward pass, DFA successfully trains state-of-the-art models such as Transformers. On the other hand, it notoriously fails to train convolutional networks. An understanding of the inner workings of DFA to explain these diverging results remains elusive. Here, we propose a theory for the success of DFA. We first show that learning in shallow networks proceeds in two steps: an alignment phase, where the model adapts its weights to align the approximate gradient with the true gradient of the loss function, is followed by a memorisation phase, where the model focuses on fitting the data. This two-step process has a degeneracy breaking effect: out of all the low-loss solutions in the landscape, a network trained with DFA naturally converges to the solution which maximises gradient alignment. We also identify a key quantity underlying alignment in deep linear networks: the conditioning of the alignment matrices. The latter enables a detailed understanding of the impact of data structure on alignment, and suggests a simple explanation for the well-known failure of DFA to train convolutional neural networks. Numerical experiments on MNIST and CIFAR10 clearly demonstrate degeneracy breaking in deep non-linear networks and show that the align-then-memorise process occurs sequentially from the bottom layers of the network to the top.

MLJun 12, 2020
Reservoir Computing meets Recurrent Kernels and Structured Transforms

Jonathan Dong, Ruben Ohana, Mushegh Rafayelyan et al.

Reservoir Computing is a class of simple yet efficient Recurrent Neural Networks where internal weights are fixed at random and only a linear output layer is trained. In the large size limit, such random neural networks have a deep connection with kernel methods. Our contributions are threefold: a) We rigorously establish the recurrent kernel limit of Reservoir Computing and prove its convergence. b) We test our models on chaotic time series prediction, a classic but challenging benchmark in Reservoir Computing, and show how the Recurrent Kernel is competitive and computationally efficient when the number of data points remains moderate. c) When the number of samples is too large, we leverage the success of structured Random Features for kernel approximation by introducing Structured Reservoir Computing. The two proposed methods, Recurrent Kernel and Structured Reservoir Computing, turn out to be much faster and more memory-efficient than conventional Reservoir Computing.

ETOct 22, 2019
Kernel computations from large-scale random features obtained by Optical Processing Units

Ruben Ohana, Jonas Wacker, Jonathan Dong et al.

Approximating kernel functions with random features (RFs)has been a successful application of random projections for nonparametric estimation. However, performing random projections presents computational challenges for large-scale problems. Recently, a new optical hardware called Optical Processing Unit (OPU) has been developed for fast and energy-efficient computation of large-scale RFs in the analog domain. More specifically, the OPU performs the multiplication of input vectors by a large random matrix with complex-valued i.i.d. Gaussian entries, followed by the application of an element-wise squared absolute value operation - this last nonlinearity being intrinsic to the sensing process. In this paper, we show that this operation results in a dot-product kernel that has connections to the polynomial kernel, and we extend this computation to arbitrary powers of the feature map. Experiments demonstrate that the OPU kernel and its RF approximation achieve competitive performance in applications using kernel ridge regression and transfer learning for image classification. Crucially, thanks to the use of the OPU, these results are obtained with time and energy savings.