Jason Grant

2papers

2 Papers

74.9LGApr 14Code
Nemotron 3 Super: Open, Efficient Mixture-of-Experts Hybrid Mamba-Transformer Model for Agentic Reasoning

Aakshita Chandiramani, Aaron Blakeman, Abdullahi Olaoye et al. · amazon-science, cmu

We describe the pre-training, post-training, and quantization of Nemotron 3 Super, a 120 billion (active 12 billion) parameter hybrid Mamba-Attention Mixture-of-Experts model. Nemotron 3 Super is the first model in the Nemotron 3 family to 1) be pre-trained in NVFP4, 2) leverage LatentMoE, a new Mixture-of-Experts architecture that optimizes for both accuracy per FLOP and accuracy per parameter, and 3) include MTP layers for inference acceleration through native speculative decoding. We pre-trained Nemotron 3 Super on 25 trillion tokens followed by post-training using supervised fine tuning (SFT) and reinforcement learning (RL). The final model supports up to 1M context length and achieves comparable accuracy on common benchmarks, while also achieving up to 2.2x and 7.5x higher inference throughput compared to GPT-OSS-120B and Qwen3.5-122B, respectively. Nemotron 3 Super datasets, along with the base, post-trained, and quantized checkpoints, are open-sourced on HuggingFace.

CVMay 19, 2016
Hierarchical Clustering in Face Similarity Score Space

Jason Grant, Patrick Flynn

Similarity scores in face recognition represent the proximity between pairs of images as computed by a matching algorithm. Given a large set of images and the proximities between all pairs, a similarity score space is defined. Cluster analysis was applied to the similarity score space to develop various taxonomies. Given the number of subjects in the dataset, we used hierarchical methods to aggregate images of the same subject. We also explored the hierarchy above and below the subject level, including clusters that reflect gender and ethnicity. Evidence supports the existence of clustering by race, gender, subject, and illumination condition.