Saurav Mishra

h-index1
2papers

2 Papers

74.9LGApr 14Code
Nemotron 3 Super: Open, Efficient Mixture-of-Experts Hybrid Mamba-Transformer Model for Agentic Reasoning

Aakshita Chandiramani, Aaron Blakeman, Abdullahi Olaoye et al. · amazon-science, cmu

We describe the pre-training, post-training, and quantization of Nemotron 3 Super, a 120 billion (active 12 billion) parameter hybrid Mamba-Attention Mixture-of-Experts model. Nemotron 3 Super is the first model in the Nemotron 3 family to 1) be pre-trained in NVFP4, 2) leverage LatentMoE, a new Mixture-of-Experts architecture that optimizes for both accuracy per FLOP and accuracy per parameter, and 3) include MTP layers for inference acceleration through native speculative decoding. We pre-trained Nemotron 3 Super on 25 trillion tokens followed by post-training using supervised fine tuning (SFT) and reinforcement learning (RL). The final model supports up to 1M context length and achieves comparable accuracy on common benchmarks, while also achieving up to 2.2x and 7.5x higher inference throughput compared to GPT-OSS-120B and Qwen3.5-122B, respectively. Nemotron 3 Super datasets, along with the base, post-trained, and quantized checkpoints, are open-sourced on HuggingFace.

CVOct 26, 2024
CAVE-Net: Classifying Abnormalities in Video Capsule Endoscopy

Ishita Harish, Saurav Mishra, Neha Bhadoria et al.

Accurate classification of medical images is critical for detecting abnormalities in the gastrointestinal tract, a domain where misclassification can significantly impact patient outcomes. We propose an ensemble-based approach to improve diagnostic accuracy in analyzing complex image datasets. Using a Convolutional Block Attention Module along with a Deep Neural Network, we leverage the unique feature extraction capabilities of each model to enhance the overall accuracy. The classification models, such as Random Forest, XGBoost, Support Vector Machine and K-Nearest Neighbors are introduced to further diversify the predictive power of proposed ensemble. By using these methods, the proposed framework, CAVE-Net, provides robust feature discrimination and improved classification results. Experimental evaluations demonstrate that the CAVE-Net achieves high accuracy and robustness across challenging and imbalanced classes, showing significant promise for broader applications in computer vision tasks.