LGMar 11

H2LooP Spark Preview: Continual Pretraining of Large Language Models for Low-Level Embedded Systems Code

Amit Singh, Vedant Nipane, Pulkit Agrawal, Jatin Kishnani

arXiv:2603.11139v112.5h-index: 30Has Code

Predicted impact top 12% in LG · last 90 daysOriginality Incremental advance

AI Analysis

This addresses the challenge for developers and engineers working with specialized embedded systems code, representing an incremental improvement through targeted adaptation of existing methods.

The paper tackled the problem of limited code generation abilities of large language models in low-level embedded systems programming by introducing H2LooP Spark Preview, a continual pretraining pipeline that adapts an open model to this domain, resulting in a 70.4% reduction in in-domain perplexity and outperforming larger models like Claude Opus 4.6 on 8 out of 13 embedded code completion categories.

Large language models (LLMs) demonstrate strong code generation abilities in general-purpose programming languages but remain limited in specialized domains such as low-level embedded systems programming. This domain involves hardware register manipulation, vendor-specific SDKs, real-time operating system APIs, and hardware abstraction layers that are underrepresented in standard pretraining corpora. We introduce H2LooP Spark Preview, a continual pretraining (CPT) pipeline that adapts the OLMo-3-7B-a fully open language model to the embedded systems domain using BF16 LoRA with rank-stabilized scaling on 8 NVIDIA H100 GPUs. Our training corpus is constructed from repository-datasheet pairs covering 100B tokens of raw embedded systems data across 117 manufacturers, processed using the hierarchical datasheet-to-code mapping approach proposed in SpecMap (Nipane et al., 2026). The resulting curated dataset split contains 23.5B tokens across 13 embedded domains. Continual pretraining with high-rank LoRA (r=512) yields substantial gains, reducing in-domain perplexity by 70.4% and held-out repository perplexity by 66.1%. On generative code completion benchmarks spanning 13 embedded domains, our 7B model outperforms Claude Opus 4.6 and Qwen3-Coder-30B on 8 categories in token accuracy, showing that targeted continual pretraining enables smaller open-weight models to rival frontier systems on specialized technical tasks. We release the production training checkpoint on Huggingface as an open-source artifact.

View on arXiv PDF

Similar