Natalie Maman

2papers

2 Papers

9.3ARMay 28
elasticAI.explorer: Towards a Unified End-to-End Framework for Hardware-Aware Neural Architecture Search

Natalie Maman, Florian Hettstedt, Andreas Erbslöh et al.

Neural Architecture Search (NAS) has become an important approach for automatically designing neural networks under task-specific and hardware-specific constraints. However, many existing NAS frameworks tightly couple search space definitions, model implementations, and deployment pipelines, making extension to new hardware platforms and custom operators difficult. In this paper, we present the elasticAI.explorer, an extensible Python framework for hardware-aware NAS built on top of Optuna. The framework introduces a YAML-based search space specification that dynamically translates into executable neural network models during sampling. The approach supports layer-wise, cell-based, and hierarchical search spaces while maintaining a unified interface for optimization and deployment. Beyond architecture generation, the framework integrates hardware-specific code generation, Docker-based cross-compilation toolchains, and automated creation of on-device benchmarking binaries, enabling hardware-in-the-loop NAS workflows. The system further provides extensible evaluators for FLOPs, parameter count, and latency estimation. The elasticAI.explorer aims to reduce the engineering overhead of embedded AI deployment and accelerate research on hardware-aware NAS for heterogeneous accelerator platforms

20.5ARMay 28
Precomputed 1D-CNNs for Atrial Fibrillation Detection on Tiny Smart Sensor Systems

Lukas Einhaus, Natalie Maman, Julian Hoever et al.

1D-CNNs play a crucial role for time-series analysis on tiny smart sensor systems, e.g. for biosignal analysis, predictive maintenance, or structural health monitoring. LUTbased precomputation has emerged as an interesting optimization technique to implement such neural networks on FPGAs. The core idea is to precompute all possible outputs of a neural network layer and store them directly in the lookup tables of the FPGAs. This enables highly resource-efficient networks with ultra-low latency but suffers from poor scalability. Previous work has explored using depthwise-separable convolutions to improve scalability. In this paper, we generalize this approach to consider additional forms of grouped convolutions. Based on this, we propose a novel type of convolutional block and an algorithm to guide the choice of hyper parameters for this block. We evaluate our approach on a medical time-series dataset for predicting atrial fibrillation using the MIT-BIH database (ECG recordings). The resulting hardware accelerators are small enough to be deployed on an AMD Spartan 7 S15. They achieve a F1-Score of up to 95% while only requiring 2,844 LUTs and no DSPs or BRAM.