AIMar 17, 2025
The Amazon Nova Family of Models: Technical Report and Model CardAmazon AGI, Aaron Langford, Aayush Shah et al. · amazon-science
We present Amazon Nova, a new generation of state-of-the-art foundation models that deliver frontier intelligence and industry-leading price performance. Amazon Nova Pro is a highly-capable multimodal model with the best combination of accuracy, speed, and cost for a wide range of tasks. Amazon Nova Lite is a low-cost multimodal model that is lightning fast for processing images, video, documents and text. Amazon Nova Micro is a text-only model that delivers our lowest-latency responses at very low cost. Amazon Nova Canvas is an image generation model that creates professional grade images with rich customization controls. Amazon Nova Reel is a video generation model offering high-quality outputs, customization, and motion control. Our models were built responsibly and with a commitment to customer trust, security, and reliability. We report benchmarking results for core capabilities, agentic performance, long context, functional adaptation, runtime performance, and human evaluation.
CLJun 15, 2022
Alexa Teacher Model: Pretraining and Distilling Multi-Billion-Parameter Encoders for Natural Language Understanding SystemsJack FitzGerald, Shankar Ananthakrishnan, Konstantine Arkoudas et al. · amazon-science, gatech
We present results from a large-scale experiment on pretraining encoders with non-embedding parameter counts ranging from 700M to 9.3B, their subsequent distillation into smaller models ranging from 17M-170M parameters, and their application to the Natural Language Understanding (NLU) component of a virtual assistant system. Though we train using 70% spoken-form data, our teacher models perform comparably to XLM-R and mT5 when evaluated on the written-form Cross-lingual Natural Language Inference (XNLI) corpus. We perform a second stage of pretraining on our teacher models using in-domain data from our system, improving error rates by 3.86% relative for intent classification and 7.01% relative for slot filling. We find that even a 170M-parameter model distilled from our Stage 2 teacher model has 2.88% better intent classification and 7.69% better slot filling error rates when compared to the 2.3B-parameter teacher trained only on public data (Stage 1), emphasizing the importance of in-domain data for pretraining. When evaluated offline using labeled NLU data, our 17M-parameter Stage 2 distilled model outperforms both XLM-R Base (85M params) and DistillBERT (42M params) by 4.23% to 6.14%, respectively. Finally, we present results from a full virtual assistant experimentation platform, where we find that models trained using our pretraining and distillation pipeline outperform models distilled from 85M-parameter teachers by 3.74%-4.91% on an automatic measurement of full-system user dissatisfaction.
CVAug 8, 2022
Towards Semantic Communications: Deep Learning-Based Image Semantic CodingDanlan Huang, Feifei Gao, Xiaoming Tao et al.
Semantic communications has received growing interest since it can remarkably reduce the amount of data to be transmitted without missing critical information. Most existing works explore the semantic encoding and transmission for text and apply techniques in Natural Language Processing (NLP) to interpret the meaning of the text. In this paper, we conceive the semantic communications for image data that is much more richer in semantics and bandwidth sensitive. We propose an reinforcement learning based adaptive semantic coding (RL-ASC) approach that encodes images beyond pixel level. Firstly, we define the semantic concept of image data that includes the category, spatial arrangement, and visual feature as the representation unit, and propose a convolutional semantic encoder to extract semantic concepts. Secondly, we propose the image reconstruction criterion that evolves from the traditional pixel similarity to semantic similarity and perceptual performance. Thirdly, we design a novel RL-based semantic bit allocation model, whose reward is the increase in rate-semantic-perceptual performance after encoding a certain semantic concept with adaptive quantization level. Thus, the task-related information is preserved and reconstructed properly while less important data is discarded. Finally, we propose the Generative Adversarial Nets (GANs) based semantic decoder that fuses both locally and globally features via an attention module. Experimental results demonstrate that the proposed RL-ASC is noise robust and could reconstruct visually pleasant and semantic consistent image, and saves times of bit cost compared to standard codecs and other deep learning-based image codecs.
NIMar 30
YUHENG-OS: A Cloud-Native Space Cluster Operating SystemJin Zhang, Jiachen Sun, Kai Liu et al.
As industry and academia continue to advance spaceborne computing and communication capabilities, the formation of cloud-native space clusters (CNSCs) has become an increasingly evident trend. This evolution progressively exposes the resource management challenges associated with coordinating fragmented and heterogeneous onboard resources while supporting large-scale and diverse space applications. However, directly transplanting mature terrestrial cloud-native cluster operating system paradigms into space is ineffective due to the fragmentation of spaceborne computing resources and satellite mobility, which collectively impose substantial challenges on resource awareness and orchestration. This article presents YUHENG-OS, a cloud-native space cluster operating system tailored for CNSCs. YUHENG-OS provides unified abstraction, awareness, and orchestration of heterogeneous spaceborne infrastructure, enabling cluster-wide task deployment and scheduling across distributed satellites. We introduce a four-layer system architecture and three key enabling technologies: modeling of heterogeneous resource demands for space tasks, fragmented heterogeneous resource awareness under network constraints, and matching of differentiated tasks with multidimensional heterogeneous resources under temporal dependency constraints. Evaluation results show that, compared with representative terrestrial cloud-native cluster operating systems exemplified by Kubernetes, YUHENG-OS achieves a substantially higher task completion ratio, with improvements of up to 98%. This advantage is primarily attributed to its ability to reduce resource awareness delay by 71%.
CLMay 19, 2024
MAML-en-LLM: Model Agnostic Meta-Training of LLMs for Improved In-Context LearningSanchit Sinha, Yuguang Yue, Victor Soto et al.
Adapting large language models (LLMs) to unseen tasks with in-context training samples without fine-tuning remains an important research problem. To learn a robust LLM that adapts well to unseen tasks, multiple meta-training approaches have been proposed such as MetaICL and MetaICT, which involve meta-training pre-trained LLMs on a wide variety of diverse tasks. These meta-training approaches essentially perform in-context multi-task fine-tuning and evaluate on a disjointed test set of tasks. Even though they achieve impressive performance, their goal is never to compute a truly general set of parameters. In this paper, we propose MAML-en-LLM, a novel method for meta-training LLMs, which can learn truly generalizable parameters that not only perform well on disjointed tasks but also adapts to unseen tasks. We see an average increase of 2% on unseen domains in the performance while a massive 4% improvement on adaptation performance. Furthermore, we demonstrate that MAML-en-LLM outperforms baselines in settings with limited amount of training data on both seen and unseen domains by an average of 2%. Finally, we discuss the effects of type of tasks, optimizers and task complexity, an avenue barely explored in meta-training literature. Exhaustive experiments across 7 task settings along with two data settings demonstrate that models trained with MAML-en-LLM outperform SOTA meta-training approaches.
CLFeb 3, 2025
MergeME: Model Merging Techniques for Homogeneous and Heterogeneous MoEsYuhang Zhou, Giannis Karamanolakis, Victor Soto et al.
The recent success of specialized Large Language Models (LLMs) in domains such as mathematical reasoning and coding has led to growing interest in methods for merging these expert LLMs into a unified Mixture-of-Experts (MoE) model, with the goal of enhancing performance in each domain while retaining effectiveness on general tasks. However, the effective merging of expert models remains an open challenge, especially for models with highly divergent weight parameters or different architectures. State-of-the-art MoE merging methods only work with homogeneous model architectures and rely on simple unweighted averaging to merge expert layers, which does not address parameter interference and requires extensive fine-tuning of the merged MoE to restore performance. To address these limitations, this paper introduces new MoE merging techniques, including strategies to mitigate parameter interference, routing heuristics to reduce the need for MoE fine-tuning, and a novel method for merging experts with different architectures. Extensive experiments across multiple domains demonstrate the effectiveness of our proposed methods, reducing fine-tuning costs, improving performance over state-of-the-art methods, and expanding the applicability of MoE merging.
ITDec 30, 2021
Semantic Communications: Principles and ChallengesZhijin Qin, Xiaoming Tao, Jianhua Lu et al.
Semantic communication, regarded as the breakthrough beyond the Shannon paradigm, aims at the successful transmission of semantic information conveyed by the source rather than the accurate reception of each single symbol or bit regardless of its meaning. This article provides an overview on semantic communications. After a brief review of Shannon information theory, we discuss semantic communications with theory, framework, and system design enabled by deep learning. Different from the symbol/bit error rate used for measuring conventional communication systems, performance metrics for semantic communications are also discussed. The article concludes with several open questions in semantic communications.
ITNov 24, 2021
Edge Artificial Intelligence for 6G: Vision, Enabling Technologies, and ApplicationsKhaled B. Letaief, Yuanming Shi, Jianmin Lu et al.
The thriving of artificial intelligence (AI) applications is driving the further evolution of wireless networks. It has been envisioned that 6G will be transformative and will revolutionize the evolution of wireless from "connected things" to "connected intelligence". However, state-of-the-art deep learning and big data analytics based AI systems require tremendous computation and communication resources, causing significant latency, energy consumption, network congestion, and privacy leakage in both of the training and inference processes. By embedding model training and inference capabilities into the network edge, edge AI stands out as a disruptive technology for 6G to seamlessly integrate sensing, communication, computation, and intelligence, thereby improving the efficiency, effectiveness, privacy, and security of 6G networks. In this paper, we shall provide our vision for scalable and trustworthy edge AI systems with integrated design of wireless communication strategies and decentralized machine learning models. New design principles of wireless networks, service-driven resource allocation optimization methods, as well as a holistic end-to-end system architecture to support edge AI will be described. Standardization, software and hardware platforms, and application scenarios are also discussed to facilitate the industrialization and commercialization of edge AI systems.
CLMar 29, 2021
Industry Scale Semi-Supervised Learning for Natural Language UnderstandingLuoxin Chen, Francisco Garcia, Varun Kumar et al.
This paper presents a production Semi-Supervised Learning (SSL) pipeline based on the student-teacher framework, which leverages millions of unlabeled examples to improve Natural Language Understanding (NLU) tasks. We investigate two questions related to the use of unlabeled data in production SSL context: 1) how to select samples from a huge unlabeled data pool that are beneficial for SSL training, and 2) how do the selected data affect the performance of different state-of-the-art SSL techniques. We compare four widely used SSL techniques, Pseudo-Label (PL), Knowledge Distillation (KD), Virtual Adversarial Training (VAT) and Cross-View Training (CVT) in conjunction with two data selection methods including committee-based selection and submodular optimization based selection. We further examine the benefits and drawbacks of these techniques when applied to intent classification (IC) and named entity recognition (NER) tasks, and provide guidelines specifying when each of these methods might be beneficial to improve large scale NLU systems.
CVMar 29, 2021
Category-Adaptive Domain Adaptation for Semantic SegmentationZhiming Wang, Yantian Luo, Danlan Huang et al.
Unsupervised domain adaptation (UDA) becomes more and more popular in tackling real-world problems without ground truth of the target domain. Though tedious annotation work is not required, UDA unavoidably faces two problems: 1) how to narrow the domain discrepancy to boost the transferring performance; 2) how to improve pseudo annotation producing mechanism for self-supervised learning (SSL). In this paper, we focus on UDA for semantic segmentation task. Firstly, we introduce adversarial learning into style gap bridging mechanism to keep the style information from two domains in the similar space. Secondly, to keep the balance of pseudo labels on each category, we propose a category-adaptive threshold mechanism to choose category-wise pseudo labels for SSL. The experiments are conducted using GTA5 as the source domain, Cityscapes as the target domain. The results show that our model outperforms the state-of-the-arts with a noticeable gain on cross-domain adaptation tasks.
IVMar 4, 2021
Perceptual Image Restoration with High-Quality Priori and Degradation LearningChaoyi Han, Yiping Duan, Xiaoming Tao et al.
Perceptual image restoration seeks for high-fidelity images that most likely degrade to given images. For better visual quality, previous work proposed to search for solutions within the natural image manifold, by exploiting the latent space of a generative model. However, the quality of generated images are only guaranteed when latent embedding lies close to the prior distribution. In this work, we propose to restrict the feasible region within the prior manifold. This is accomplished with a non-parametric metric for two distributions: the Maximum Mean Discrepancy (MMD). Moreover, we model the degradation process directly as a conditional distribution. We show that our model performs well in measuring the similarity between restored and degraded images. Instead of optimizing the long criticized pixel-wise distance over degraded images, we rely on such model to find visual pleasing images with high probability. Our simultaneous restoration and enhancement framework generalizes well to real-world complicated degradation types. The experimental results on perceptual quality and no-reference image quality assessment (NR-IQA) demonstrate the superior performance of our method.
ITJan 4, 2016
Approximate Message Passing with Nearest Neighbor Sparsity Pattern LearningXiangming Meng, Sheng Wu, Linling Kuang et al.
We consider the problem of recovering clustered sparse signals with no prior knowledge of the sparsity pattern. Beyond simple sparsity, signals of interest often exhibits an underlying sparsity pattern which, if leveraged, can improve the reconstruction performance. However, the sparsity pattern is usually unknown a priori. Inspired by the idea of k-nearest neighbor (k-NN) algorithm, we propose an efficient algorithm termed approximate message passing with nearest neighbor sparsity pattern learning (AMP-NNSPL), which learns the sparsity pattern adaptively. AMP-NNSPL specifies a flexible spike and slab prior on the unknown signal and, after each AMP iteration, sets the sparse ratios as the average of the nearest neighbor estimates via expectation maximization (EM). Experimental results on both synthetic and real data demonstrate the superiority of our proposed algorithm both in terms of reconstruction performance and computational complexity.
ITApr 16, 2012
Rateless Codes with Progressive Recovery for Layered Multimedia DeliveryZhao Chen, Liuguo Yin, Mai Xu et al.
This paper proposes a novel approach, based on unequal error protection, to enhance rateless codes with progressive recovery for layered multimedia delivery. With a parallel encoding structure, the proposed Progressive Rateless codes (PRC) assign unequal redundancy to each layer in accordance with their importance. Each output symbol contains information from all layers, and thus the stream layers can be recovered progressively at the expected received ratios of output symbols. Furthermore, the dependency between layers is naturally considered. The performance of the PRC is evaluated and compared with some related UEP approaches. Results show that our PRC approach provides better recovery performance with lower overhead both theoretically and numerically.