Chintan Patel

CL
h-index57
7papers
201citations
Novelty44%
AI Score50

7 Papers

95.3LGApr 14Code
Nemotron 3 Super: Open, Efficient Mixture-of-Experts Hybrid Mamba-Transformer Model for Agentic Reasoning

Aakshita Chandiramani, Aaron Blakeman, Abdullahi Olaoye et al. · amazon-science, cmu

We describe the pre-training, post-training, and quantization of Nemotron 3 Super, a 120 billion (active 12 billion) parameter hybrid Mamba-Attention Mixture-of-Experts model. Nemotron 3 Super is the first model in the Nemotron 3 family to 1) be pre-trained in NVFP4, 2) leverage LatentMoE, a new Mixture-of-Experts architecture that optimizes for both accuracy per FLOP and accuracy per parameter, and 3) include MTP layers for inference acceleration through native speculative decoding. We pre-trained Nemotron 3 Super on 25 trillion tokens followed by post-training using supervised fine tuning (SFT) and reinforcement learning (RL). The final model supports up to 1M context length and achieves comparable accuracy on common benchmarks, while also achieving up to 2.2x and 7.5x higher inference throughput compared to GPT-OSS-120B and Qwen3.5-122B, respectively. Nemotron 3 Super datasets, along with the base, post-trained, and quantized checkpoints, are open-sourced on HuggingFace.

CLAug 20, 2025
NVIDIA Nemotron Nano 2: An Accurate and Efficient Hybrid Mamba-Transformer Reasoning Model

Aarti Basant, Abhijit Khairnar, Abhijit Paithankar et al. · nvidia

We introduce Nemotron-Nano-9B-v2, a hybrid Mamba-Transformer language model designed to increase throughput for reasoning workloads while achieving state-of-the-art accuracy compared to similarly-sized models. Nemotron-Nano-9B-v2 builds on the Nemotron-H architecture, in which the majority of the self-attention layers in the common Transformer architecture are replaced with Mamba-2 layers, to achieve improved inference speed when generating the long thinking traces needed for reasoning. We create Nemotron-Nano-9B-v2 by first pre-training a 12-billion-parameter model (Nemotron-Nano-12B-v2-Base) on 20 trillion tokens using an FP8 training recipe. After aligning Nemotron-Nano-12B-v2-Base, we employ the Minitron strategy to compress and distill the model with the goal of enabling inference on up to 128k tokens on a single NVIDIA A10G GPU (22GiB of memory, bfloat16 precision). Compared to existing similarly-sized models (e.g., Qwen3-8B), we show that Nemotron-Nano-9B-v2 achieves on-par or better accuracy on reasoning benchmarks while achieving up to 6x higher inference throughput in reasoning settings like 8k input and 16k output tokens. We are releasing Nemotron-Nano-9B-v2, Nemotron-Nano12B-v2-Base, and Nemotron-Nano-9B-v2-Base checkpoints along with the majority of our pre- and post-training datasets on Hugging Face.

CLDec 23, 2025
Nemotron 3 Nano: Open, Efficient Mixture-of-Experts Hybrid Mamba-Transformer Model for Agentic Reasoning

Aaron Blakeman, Aaron Grattafiori, Aarti Basant et al. · nvidia

We present Nemotron 3 Nano 30B-A3B, a Mixture-of-Experts hybrid Mamba-Transformer language model. Nemotron 3 Nano was pretrained on 25 trillion text tokens, including more than 3 trillion new unique tokens over Nemotron 2, followed by supervised fine tuning and large-scale RL on diverse environments. Nemotron 3 Nano achieves better accuracy than our previous generation Nemotron 2 Nano while activating less than half of the parameters per forward pass. It achieves up to 3.3x higher inference throughput than similarly-sized open models like GPT-OSS-20B and Qwen3-30B-A3B-Thinking-2507, while also being more accurate on popular benchmarks. Nemotron 3 Nano demonstrates enhanced agentic, reasoning, and chat abilities and supports context lengths up to 1M tokens. We release both our pretrained Nemotron 3 Nano 30B-A3B Base and post-trained Nemotron 3 Nano 30B-A3B checkpoints on Hugging Face.

CLDec 24, 2025
NVIDIA Nemotron 3: Efficient and Open Intelligence

Aaron Blakeman, Aaron Grattafiori, Aarti Basant et al. · nvidia

We introduce the Nemotron 3 family of models - Nano, Super, and Ultra. These models deliver strong agentic, reasoning, and conversational capabilities. The Nemotron 3 family uses a Mixture-of-Experts hybrid Mamba-Transformer architecture to provide best-in-class throughput and context lengths of up to 1M tokens. Super and Ultra models are trained with NVFP4 and incorporate LatentMoE, a novel approach that improves model quality. The two larger models also include MTP layers for faster text generation. All Nemotron 3 models are post-trained using multi-environment reinforcement learning enabling reasoning, multi-step tool use, and support granular reasoning budget control. Nano, the smallest model, outperforms comparable models in accuracy while remaining extremely cost-efficient for inference. Super is optimized for collaborative agents and high-volume workloads such as IT ticket automation. Ultra, the largest model, provides state-of-the-art accuracy and reasoning performance. Nano is released together with its technical report and this white paper, while Super and Ultra will follow in the coming months. We will openly release the model weights, pre- and post-training software, recipes, and all data for which we hold redistribution rights.

CRSep 14, 2023
AIDPS:Adaptive Intrusion Detection and Prevention System for Underwater Acoustic Sensor Networks

Soumadeep Das, Aryan Mohammadi Pasikhani, Prosanta Gope et al.

Underwater Acoustic Sensor Networks (UW-ASNs) are predominantly used for underwater environments and find applications in many areas. However, a lack of security considerations, the unstable and challenging nature of the underwater environment, and the resource-constrained nature of the sensor nodes used for UW-ASNs (which makes them incapable of adopting security primitives) make the UW-ASN prone to vulnerabilities. This paper proposes an Adaptive decentralised Intrusion Detection and Prevention System called AIDPS for UW-ASNs. The proposed AIDPS can improve the security of the UW-ASNs so that they can efficiently detect underwater-related attacks (e.g., blackhole, grayhole and flooding attacks). To determine the most effective configuration of the proposed construction, we conduct a number of experiments using several state-of-the-art machine learning algorithms (e.g., Adaptive Random Forest (ARF), light gradient-boosting machine, and K-nearest neighbours) and concept drift detection algorithms (e.g., ADWIN, kdqTree, and Page-Hinkley). Our experimental results show that incremental ARF using ADWIN provides optimal performance when implemented with One-class support vector machine (SVM) anomaly-based detectors. Furthermore, our extensive evaluation results also show that the proposed scheme outperforms state-of-the-art bench-marking methods while providing a wider range of desirable features such as scalability and complexity.

CRJan 15, 2021
CARE: Lightweight Attack Resilient Secure Boot Architecturewith Onboard Recovery for RISC-V based SOC

Avani Dave, Nilanjan Banerjee, Chintan Patel

Recent technological advancements have proliferated the use of small embedded devices for collecting, processing, and transferring the security-critical information. The Internet of Things (IoT) has enabled remote access and control of these network-connected devices. Consequently, an attacker can exploit security vulnerabilities and compromise these devices. In this context, the secure boot becomes a useful security mechanism to verify the integrity and authenticity of the software state of the devices. However, the current secure boot schemes focus on detecting the presence of potential malware on the device but not on disinfecting and restoring the soft-ware to a benign state. This manuscript presents CARE- the first secure boot framework that provides detection, resilience, and onboard recovery mechanism for the com-promised devices. The framework uses a prototype hybrid CARE: Code Authentication and Resilience Engine to verify the software state and restore it to a benign state. It uses Physical Memory Protection (PMP) and other security enchaining techniques of RISC-V processor to pro-vide resilience from modern attacks. The state-of-the-art comparison and performance analysis results indicate that the proposed secure boot framework provides a promising resilience and recovery mechanism with very little 8 % performance and resource overhead

CRJan 15, 2021
SRACARE: Secure Remote Attestation with Code Authentication and Resilience Engine

Avani Dave, Nilanjan Banerjee, Chintan Patel

Recent technological advancements have enabled proliferated use of small embedded and IoT devices for collecting, processing, and transferring the security-critical information and user data. This exponential use has acted as a catalyst in the recent growth of sophisticated attacks such as the replay, man-in-the-middle, and malicious code modification to slink, leak, tweak or exploit the security-critical information in malevolent activities. Therefore, secure communication and software state assurance (at run-time and boot-time) of the device has emerged as open security problems. Furthermore, these devices need to have an appropriate recovery mechanism to bring them back to the known-good operational state. Previous researchers have demonstrated independent methods for attack detection and safeguard. However, the majority of them lack in providing onboard system recovery and secure communication techniques. To bridge this gap, this manuscript proposes SRACARE- a framework that utilizes the custom lightweight, secure communication protocol that performs remote/local attestation, and secure boot with an onboard resilience recovery mechanism to protect the devices from the above-mentioned attacks. The prototype employs an efficient lightweight, low-power 32-bit RISC-V processor, secure communication protocol, code authentication, and resilience engine running on the Artix 7 Field Programmable Gate Array(FPGA) board. This work presents the performance evaluation and state-of-the-art comparison results, which shows promising resilience to attacks and demonstrate the novel protection mechanism with onboard recovery. The framework achieves these with only 8 % performance overhead and a very small increase in hardware-software footprint.