SE AIMar 17

InCoder-32B: Code Foundation Model for Industrial Scenarios

Jian Yang, Wei Zhang, Jiajun Wu, Junhang Cheng, Shawn Guo, Haowen Wang, Weicheng Gu, Yaxin Du, Joseph Li, Fanglin Xu, Yizhi Li, Lin Jing

arXiv:2603.1679068.82 citationsh-index: 33Has Code

Predicted impact top 1% in SE · last 90 daysOriginality Incremental advance

AI Analysis

This addresses performance gaps in industrial code intelligence for domains like chip design and embedded systems, though it appears incremental as it builds on existing foundation model approaches.

The paper tackles the problem of code large language models underperforming in industrial scenarios requiring hardware semantics and resource constraints, by introducing InCoder-32B, a 32B-parameter model that achieves competitive performance on general tasks and establishes strong baselines across industrial domains.

Recent code large language models have achieved remarkable progress on general programming tasks. Nevertheless, their performance degrades significantly in industrial scenarios that require reasoning about hardware semantics, specialized language constructs, and strict resource constraints. To address these challenges, we introduce InCoder-32B (Industrial-Coder-32B), the first 32B-parameter code foundation model unifying code intelligence across chip design, GPU kernel optimization, embedded systems, compiler optimization, and 3D modeling. By adopting an efficient architecture, we train InCoder-32B from scratch with general code pre-training, curated industrial code annealing, mid-training that progressively extends context from 8K to 128K tokens with synthetic industrial reasoning data, and post-training with execution-grounded verification. We conduct extensive evaluation on 14 mainstream general code benchmarks and 9 industrial benchmarks spanning 4 specialized domains. Results show InCoder-32B achieves highly competitive performance on general tasks while establishing strong open-source baselines across industrial domains.

View on arXiv PDF

Similar