LGApr 18
When Spike Sparsity Does Not Translate to Deployed Cost: VS-WNO on Jetson Orin NanoJason Yoo, Shailesh Garg, Souvik Chakraborty et al.
Spiking neural operators are appealing for neuromorphic edge computing because event-driven substrates can, in principle, translate sparse activity into lower latency and energy. Whether that advantage survives deployment on commodity edge-GPU software stacks, however, remains unclear. We study this question on a Jetson Orin Nano 8 GB using five pretrained variable-spiking wavelet neural operator (VS-WNO) checkpoints and five matched dense wavelet neural operator (WNO) checkpoints on the Darcy rectangular benchmark. On a reference-aligned path, VS-WNO exhibits substantial algorithmic sparsity, with mean spike rates decreasing from 54.26% at the first spiking layer to 18.15% at the fourth. On a deployment-style request path, however, this sparsity does not reduce deployed cost: VS-WNO reaches 59.6 ms latency and 228.0 mJ dynamic energy per inference, whereas dense WNO reaches 53.2 ms and 180.7 mJ, while also achieving slightly lower reference-path error (1.77% versus 1.81%). Nsight Systems indicates that the request path remains launch-dominated and dense rather than sparsity-aware: for VS-WNO, cudaLaunchKernel accounts for 81.6% of CUDA API time within the latency window, and dense convolution kernels account for 53.8% of GPU kernel time; dense WNO shows the same pattern. On this Jetson-class GPU stack, spike sparsity is measurable but does not reduce deployed cost because the runtime does not suppress dense work as spike activity decreases.
LGApr 2
Graph Neural Operator Towards Edge Deployability and Portability for Sparse-to-Dense, Real-Time Virtual Sensing on Irregular GridsWilliam Howes, Jason Yoo, Kazuma Kobayashi et al.
Accurate sensing of spatially distributed physical fields typically requires dense instrumentation, which is often infeasible in real-world systems due to cost, accessibility, and environmental constraints. Physics-based solvers address this through direct numerical integration of governing equations, but their computational latency and power requirements preclude real-time use in resource-constrained monitoring and control systems. Here we introduce VIRSO (Virtual Irregular Real-Time Sparse Operator), a graph-based neural operator for sparse-to-dense reconstruction on irregular geometries, and a variable-connectivity algorithm, Variable KNN (V-KNN), for mesh-informed graph construction. Unlike prior neural operators that treat hardware deployability as secondary, VIRSO reframes inference as measurement: the combination of both spectral and spatial analysis provides accurate reconstruction without the high latency and power consumption of previous graph-based methodologies with poor scalability, presenting VIRSO as a potential candidate for edge-constrained, real-time virtual sensing. We evaluate VIRSO on three nuclear thermal-hydraulic benchmarks of increasing geometric and multiphysics complexity, across reconstruction ratios from 47:1 to 156:1. VIRSO achieves mean relative $L_2$ errors below 1%, outperforming other benchmark operators while using fewer parameters. The full 10-layer configuration reduces the energy-delay product (EDP) from ${\approx}206$ J$\cdot$ms for the graph operator baseline to $10.1$ J$\cdot$ms on an NVIDIA H200. Implemented on an NVIDIA Jetson Orin Nano, all configurations of VIRSO provide sub-10 W power consumption and sub-second latency. These results establish the edge-feasibility and hardware-portability of VIRSO and present compute-aware operator learning as a new paradigm for real-time sensing in inaccessible and resource-constrained environments.
LGMay 20, 2022
BayesPCN: A Continually Learnable Predictive Coding Associative MemoryJason Yoo, Frank Wood
Associative memory plays an important role in human intelligence and its mechanisms have been linked to attention in machine learning. While the machine learning community's interest in associative memories has recently been rekindled, most work has focused on memory recall ($read$) over memory learning ($write$). In this paper, we present BayesPCN, a hierarchical associative memory capable of performing continual one-shot memory writes without meta-learning. Moreover, BayesPCN is able to gradually forget past observations ($forget$) to free its memory. Experiments show that BayesPCN can recall corrupted i.i.d. high-dimensional data observed hundreds to a thousand ``timesteps'' ago without a large drop in recall ability compared to the state-of-the-art offline-learned parametric memory models.
LGDec 10, 2020Code
Ensemble Squared: A Meta AutoML SystemJason Yoo, Tony Joseph, Dylan Yung et al.
There are currently many barriers that prevent non-experts from exploiting machine learning solutions ranging from the lack of intuition on statistical learning techniques to the trickiness of hyperparameter tuning. Such barriers have led to an explosion of interest in automated machine learning (AutoML), whereby an off-the-shelf system can take care of many of the steps for end-users without the need for expertise in machine learning. This paper presents Ensemble Squared (Ensemble$^2$), an AutoML system that ensembles the results of state-of-the-art open-source AutoML systems. Ensemble$^2$ exploits the diversity of existing AutoML systems by leveraging the differences in their model search space and heuristics. Empirically, we show that diversity of each AutoML system is sufficient to justify ensembling at the AutoML system level. In demonstrating this, we also establish new state-of-the-art AutoML results on the OpenML tabular classification benchmark.
LGFeb 14, 2024
Layerwise Proximal Replay: A Proximal Point Method for Online Continual LearningJason Yoo, Yunpeng Liu, Frank Wood et al.
In online continual learning, a neural network incrementally learns from a non-i.i.d. data stream. Nearly all online continual learning methods employ experience replay to simultaneously prevent catastrophic forgetting and underfitting on past data. Our work demonstrates a limitation of this approach: neural networks trained with experience replay tend to have unstable optimization trajectories, impeding their overall accuracy. Surprisingly, these instabilities persist even when the replay buffer stores all previous training examples, suggesting that this issue is orthogonal to catastrophic forgetting. We minimize these instabilities through a simple modification of the optimization geometry. Our solution, Layerwise Proximal Replay (LPR), balances learning from new and replay data while only allowing for gradual changes in the hidden activation of past data. We demonstrate that LPR consistently improves replay-based online continual learning methods across multiple problem settings, regardless of the amount of available replay memory.
CVJun 7, 2024
Lifelong Learning of Video Diffusion Models From a Single Video StreamJason Yoo, Yingchen He, Saeid Naderiparizi et al.
This work demonstrates that training autoregressive video diffusion models from a single video stream$\unicode{x2013}$resembling the experience of embodied agents$\unicode{x2013}$is not only possible, but can also be as effective as standard offline training given the same number of gradient steps. Our work further reveals that this main result can be achieved using experience replay methods that only retain a subset of the preceding video stream. To support training and evaluation in this setting, we introduce four new datasets for streaming lifelong generative video modeling: Lifelong Bouncing Balls, Lifelong 3D Maze, Lifelong Drive, and Lifelong PLAICraft, each consisting of one million consecutive frames from environments of increasing complexity.