CLFeb 24

On Data Engineering for Scaling LLM Terminal Capabilities

arXiv:2602.21193v15 citationsh-index: 40Has Code
Originality Incremental advance
AI Analysis

This work addresses a gap in data engineering practices for terminal agents, benefiting researchers in AI and terminal capabilities, though it is incremental as it builds on existing models and benchmarks.

The paper tackles the lack of disclosed training data strategies for terminal agents in large language models by introducing Terminal-Task-Gen, a synthetic task generation pipeline, and Terminal-Corpus, an open-source dataset, leading to Nemotron-Terminal models that achieve substantial gains on Terminal-Bench 2.0, such as Nemotron-Terminal-8B improving from 2.5% to 13.0%.

Despite rapid recent progress in the terminal capabilities of large language models, the training data strategies behind state-of-the-art terminal agents remain largely undisclosed. We address this gap through a systematic study of data engineering practices for terminal agents, making two key contributions: (1) Terminal-Task-Gen, a lightweight synthetic task generation pipeline that supports seed-based and skill-based task construction, and (2) a comprehensive analysis of data and training strategies, including filtering, curriculum learning, long context training, and scaling behavior. Our pipeline yields Terminal-Corpus, a large-scale open-source dataset for terminal tasks. Using this dataset, we train Nemotron-Terminal, a family of models initialized from Qwen3(8B, 14B, 32B) that achieve substantial gains on Terminal-Bench 2.0: Nemotron-Terminal-8B improves from 2.5% to 13.0% Nemotron-Terminal-14B improves from 4.0% to 20.2%, and Nemotron-Terminal-32B improves from 3.4% to 27.4%, matching the performance of significantly larger models. To accelerate research in this domain, we open-source our model checkpoints and most of our synthetic datasets at https://huggingface.co/collections/nvidia/nemotron-terminal.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes