Jebacyril Arockiaraj

AR
3papers
Novelty55%
AI Score43

3 Papers

15.4ARMay 18
Efficient and Accurate Graph Classification with Hyperdimensional Computing on FPGA

Jebacyril Arockiaraj, Dhruv Parikh, Viktor Prasanna

Real-time, energy-efficient inference on edge devices is essential for graph classification across a range of applications. Hyperdimensional Computing (HDC) is a brain-inspired computing paradigm that encodes input features into low-precision, high-dimensional vectors with simple element-wise operations, making it well-suited for resource-constrained edge platforms. Recent work enhances HDC accuracy for graph classification via Nyström kernel approximations. Edge acceleration of such methods faces several challenges: (i) redundancy among (landmark) samples selected via uniform sampling, (ii) storing the Nyström projection matrix under limited on-chip memory, (iii) expensive, contention-prone codebook lookups, and (iv) load imbalance due to irregular sparsity in SpMV. To address these challenges, we propose HyperX, the first end-to-end FPGA accelerator for Nyström-based HDC graph classification at the edge. HyperX integrates four key optimizations: (i) a hybrid landmark selection strategy combining uniform sampling with determinantal point processes (DPPs) to reduce redundancy while improving accuracy; (ii) a streaming architecture for Nyström projection matrix maximizing external memory bandwidth utilization; (iii) a minimal-perfect-hash lookup engine enabling $O(1)$ key-to-index mapping; and (iv) sparsity-aware SpMV engines with static load balancing. Implemented on an AMD Zynq UltraScale+ (ZCU104) FPGA, HyperX achieves $6.85\times$ ($4.32\times$) speedup and $169\times$ ($314\times$) energy efficiency gains over optimized CPU (GPU) baselines, while improving classification accuracy by $3.4\%$ on average across TUDataset benchmarks, a widely used standard for graph classification.

17.1CVApr 23
ImageHD: Energy-Efficient On-Device Continual Learning of Visual Representations via Hyperdimensional Computing

Jebacyril Arockiaraj, Dhruv Parikh, Viktor Prasanna

On-device continual learning (CL) is critical for edge AI systems operating on non-stationary data streams, but most existing methods rely on backpropagation or exemplar-heavy classifiers, incurring substantial compute, memory, and latency overheads. Hyperdimensional computing (HDC) offers a lightweight alternative through fast, non-iterative online updates. Combined with a compact convolutional neural network (CNN) feature extractor, HDC enables efficient on-device adaptation with strong visual representations. However, prior HDC-based CL systems often depend on multi-tier memory hierarchies and complex cluster management, limiting deployability on resource-constrained hardware. We present ImageHD, an FPGA accelerator for on-device continual learning of visual data based on HDC. ImageHD targets streaming CL under strict latency and on-chip memory constraints, avoiding costly iterative optimization. At the algorithmic level, we introduce a hardware-aware CL method that bounds class exemplars through a unified exemplar memory and a hardware-efficient cluster merging strategy, while incorporating a quantized CNN front-end to reduce deployment overhead without sacrificing accuracy. At the system level, ImageHD is implemented as a streaming dataflow architecture on the AMD Zynq ZCU104 FPGA, integrating HDC encoding, similarity search, and bounded cluster management using word-packed binary hypervectors for massively parallel bitwise computation within tight on-chip resource budgets. On CORe50, ImageHD achieves up to 40.4x (4.84x) speedup and 383x (105.1x) energy efficiency over optimized CPU (GPU) baselines, demonstrating the practicality of HDC-enabled continual learning for real-time edge AI.

ARJan 27
Primitive-Driven Acceleration of Hyperdimensional Computing for Real-Time Image Classification

Dhruv Parikh, Jebacyril Arockiaraj, Viktor Prasanna

Hyperdimensional Computing (HDC) represents data using extremely high-dimensional, low-precision vectors, termed hypervectors (HVs), and performs learning and inference through lightweight, noise-tolerant operations. However, the high dimensionality, sparsity, and repeated data movement involved in HDC make these computations difficult to accelerate efficiently on conventional processors. As a result, executing core HDC operations: binding, permutation, bundling, and similarity search: on CPUs or GPUs often leads to suboptimal utilization, memory bottlenecks, and limits on real-time performance. In this paper, our contributions are two-fold. First, we develop an image-encoding algorithm that, similar in spirit to convolutional neural networks, maps local image patches to hypervectors enriched with spatial information. These patch-level hypervectors are then merged into a global representation using the fundamental HDC operations, enabling spatially sensitive and robust image encoding. This encoder achieves 95.67% accuracy on MNIST and 85.14% on Fashion-MNIST, outperforming prior HDC-based image encoders. Second, we design an end-to-end accelerator that implements these compute operations on an FPGA through a pipelined architecture that exploits parallelism both across the hypervector dimensionality and across the set of image patches. Our Alveo U280 implementation delivers 0.09ms inference latency, achieving up to 1300x and 60x speedup over state-of-the-art CPU and GPU baselines, respectively.