ARLGDec 5, 2025

Hardware Software Optimizations for Fast Model Recovery on Reconfigurable Architectures

arXiv:2512.06113v1
Originality Incremental advance
AI Analysis

This work addresses performance bottlenecks in MR for real-time physical AI and digital twins, representing an incremental improvement with domain-specific optimizations.

The paper tackles the inefficiency of Model Recovery (MR) on GPUs by introducing MERINDA, an FPGA-accelerated framework that restructures computation as a streaming dataflow pipeline, achieving up to 6.3x fewer cycles than an FPGA baseline.

Model Recovery (MR) is a core primitive for physical AI and real-time digital twins, but GPUs often execute MR inefficiently due to iterative dependencies, kernel-launch overheads, underutilized memory bandwidth, and high data-movement latency. We present MERINDA, an FPGA-accelerated MR framework that restructures computation as a streaming dataflow pipeline. MERINDA exploits on-chip locality through BRAM tiling, fixed-point kernels, and the concurrent use of LUT fabric and carry-chain adders to expose fine-grained spatial parallelism while minimizing off-chip traffic. This hardware-aware formulation removes synchronization bottlenecks and sustains high throughput across the iterative updates in MR. On representative MR workloads, MERINDA delivers up to 6.3x fewer cycles than an FPGA-based LTC baseline, enabling real-time performance for time-critical physical systems.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes