LGJul 15, 2024Code
Separable Operator NetworksXinling Yu, Sean Hooten, Ziyue Liu et al.
Operator learning has become a powerful tool in machine learning for modeling complex physical systems governed by partial differential equations (PDEs). Although Deep Operator Networks (DeepONet) show promise, they require extensive data acquisition. Physics-informed DeepONets (PI-DeepONet) mitigate data scarcity but suffer from inefficient training processes. We introduce Separable Operator Networks (SepONet), a novel framework that significantly enhances the efficiency of physics-informed operator learning. SepONet uses independent trunk networks to learn basis functions separately for different coordinate axes, enabling faster and more memory-efficient training via forward-mode automatic differentiation. We provide a universal approximation theorem for SepONet proving the existence of a separable approximation to any nonlinear continuous operator. Then, we comprehensively benchmark its representational capacity and computational performance against PI-DeepONet. Our results demonstrate SepONet's superior performance across various nonlinear and inseparable PDEs, with SepONet's advantages increasing with problem complexity, dimension, and scale. For 1D time-dependent PDEs, SepONet achieves up to 112x faster training and 82x reduction in GPU memory usage compared to PI-DeepONet, while maintaining comparable accuracy. For the 2D time-dependent nonlinear diffusion equation, SepONet efficiently handles the complexity, achieving a 6.44% mean relative $\ell_{2}$ test error, while PI-DeepONet fails due to memory constraints. This work paves the way for extreme-scale learning of continuous mappings between infinite-dimensional function spaces. Open source code is available at \url{https://github.com/HewlettPackard/separable-operator-networks}.
16.8ETMay 14
Accelerating Hybrid XOR$-$CNF Boolean Satisfiability Problems Natively with In-Memory ComputingHaesol Im, Fabian Böhm, Giacomo Pedretti et al.
The Boolean satisfiability (SAT) problem is a computationally challenging decision problem central to many industrial applications. For SAT problems in cryptanalysis, circuit design, and telecommunication, solutions can often be found more efficiently by representing them with a combination of exclusive OR (XOR) and conjunctive normal form (CNF) clauses. We propose a hardware accelerator architecture that natively embeds and solves such hybrid XOR--CNF problems using in-memory computing hardware. To achieve this, we introduce an algorithm and demonstrate, both experimentally and through simulations, how it can be efficiently implemented with memristor crossbar arrays. Compared to the conventional approaches that translate XOR--CNF problems to pure CNF problems, our simulations show that the accelerator improves computation speed, energy efficiency, and chip area utilization of in-memory accelerators by $\sim$10$\times$ for a set of hard cryptographic benchmarking problems. Moreover, the accelerator achieves a $\sim$10$\times$ speedup and a $\sim$1000$\times$ gain in energy efficiency over state-of-the-art SAT solvers running on CPUs.
19.2OCMay 14
Hardware-Compatible Single-Shot Feasible-Space Heuristics for Solving the Quadratic Assignment ProblemHaesol Im, Chan-Woo Yang, Moslem Noori et al.
Research into the development of special-purpose computing architectures designed to solve quadratic unconstrained binary optimization (QUBO) problems has flourished in recent years. It has been demonstrated in the literature that such special-purpose solvers can outperform traditional complementary metal--oxide--semiconductor architectures by orders of magnitude with respect to timing metrics on synthetic problems. However, they face challenges with constrained problems such as the quadratic assignment problem (QAP), where mapping to binary formulations such as QUBO introduces overhead and limits parallelism. In-memory computing (IMC) devices, such as memristor-based analog Ising machines, offer significant speed-ups and efficiency gains over traditional CPU-based solvers, particularly for solving combinatorial optimization problems. In this work, we present a novel hardware-aware QAP optimization framework designed for IMC hardware. By co-designing the local search heuristic with the underlying hardware, we exploit the intrinsic massive parallelism that allows for computing of full neighbourhoods simultaneously to make update decisions. We ensure binary solutions remain feasible by selecting local moves that lead to neighbouring feasible solutions, leveraging feasible-space search heuristics and the underlying structure of a given problem. Our approach is compatible with both digital computers and analog hardware. We demonstrate its effectiveness in CPU implementations by comparing it with state-of-the-art heuristics for solving the QAP.
LGMar 20, 2025
A Statistical Analysis for Per-Instance Evaluation of Stochastic Optimizers: How Many Repeats Are Enough?Moslem Noori, Elisabetta Valiante, Thomas Van Vaerenbergh et al.
A key trait of stochastic optimizers is that multiple runs of the same optimizer in attempting to solve the same problem can produce different results. As a result, their performance is evaluated over several repeats, or runs, on the problem. However, the accuracy of the estimated performance metrics depends on the number of runs and should be studied using statistical tools. We present a statistical analysis of the common metrics, and develop guidelines for experiment design to measure the optimizer's performance using these metrics to a high level of confidence and accuracy. To this end, we first discuss the confidence interval of the metrics and how they are related to the number of runs of an experiment. We then derive a lower bound on the number of repeats in order to guarantee achieving a given accuracy in the metrics. Using this bound, we propose an algorithm to adaptively adjust the number of repeats needed to ensure the accuracy of the evaluated metric. Our simulation results demonstrate the utility of our analysis and how it allows us to conduct reliable benchmarking as well as hyperparameter tuning and prevent us from drawing premature conclusions regarding the performance of stochastic optimizers.
COMP-PHJun 30, 2021
Inverse Design of Grating Couplers Using the Policy Gradient Method from Reinforcement LearningSean Hooten, Raymond G. Beausoleil, Thomas Van Vaerenbergh
We present a proof-of-concept technique for the inverse design of electromagnetic devices motivated by the policy gradient method in reinforcement learning, named PHORCED (PHotonic Optimization using REINFORCE Criteria for Enhanced Design). This technique uses a probabilistic generative neural network interfaced with an electromagnetic solver to assist in the design of photonic devices, such as grating couplers. We show that PHORCED obtains better performing grating coupler designs than local gradient-based inverse design via the adjoint method, while potentially providing faster convergence over competing state-of-the-art generative methods. As a further example of the benefits of this method, we implement transfer learning with PHORCED, demonstrating that a neural network trained to optimize 8$^\circ$ grating couplers can then be re-trained on grating couplers with alternate scattering angles while requiring >10$\times$ fewer simulations than control cases.
NESep 30, 2015
Towards Trainable Media: Using Waves for Neural Network-Style TrainingMichiel Hermans, Thomas Van Vaerenbergh
In this paper we study the concept of using the interaction between waves and a trainable medium in order to construct a matrix-vector multiplier. In particular we study such a device in the context of the backpropagation algorithm, which is commonly used for training neural networks. Here, the weights of the connections between neurons are trained by multiplying a `forward' signal with a backwards propagating `error' signal. We show that this concept can be extended to trainable media, where the gradient for the local wave number is given by multiplying signal waves and error waves. We provide a numerical example of such a system with waves traveling freely in a trainable medium, and we discuss a potential way to build such a device in an integrated photonics chip.