LGJul 17, 2025
Apple Intelligence Foundation Language Models: Tech Report 2025Ethan Li, Anders Boesen Lindbo Larsen, Chen Zhang et al. · apple-ml, cmu
We introduce two multilingual, multimodal foundation language models that power Apple Intelligence features across Apple devices and services: i a 3B-parameter on-device model optimized for Apple silicon through architectural innovations such as KV-cache sharing and 2-bit quantization-aware training; and ii a scalable server model built on a novel Parallel-Track Mixture-of-Experts PT-MoE transformer that combines track parallelism, mixture-of-experts sparse computation, and interleaved global-local attention to deliver high quality with competitive cost on Apple's Private Cloud Compute platform. Both models are trained on large-scale multilingual and multimodal datasets sourced via responsible web crawling, licensed corpora, and high-quality synthetic data, then further refined with supervised fine-tuning and reinforcement learning on a new asynchronous platform. The resulting models support several additional languages while understanding images and executing tool calls. In public benchmarks and human evaluations, both the server model and the on-device model match or surpass comparably sized open baselines. A new Swift-centric Foundation Models framework exposes guided generation, constrained tool calling, and LoRA adapter fine-tuning, allowing developers to integrate these capabilities with a few lines of code. The latest advancements in Apple Intelligence models are grounded in our Responsible AI approach with safeguards like content filtering and locale-specific evaluation, as well as our commitment to protecting our users' privacy with innovations like Private Cloud Compute.
LGAug 7, 2025
Optimal Corpus Aware Training for Neural Machine TranslationYi-Hsiu Liao, Cheng Shen, Brenda et al.
Corpus Aware Training (CAT) leverages valuable corpus metadata during training by injecting corpus information into each training example, and has been found effective in the literature, commonly known as the "tagging" approach. Models trained with CAT inherently learn the quality, domain and nuance between corpora directly from data, and can easily switch to different inference behavior. To achieve the best evaluation, CAT models pre-define a group of high quality data before training starts which can be error-prone and inefficient. In this work, we propose Optimal Corpus Aware Training (OCAT), which fine-tunes a CAT pre-trained model by freezing most of the model parameters and only tuning small set of corpus-related parameters. We show that OCAT is lightweight, resilient to overfitting, and effective in boosting model accuracy. We use WMT23 English to Chinese and English to German translation tasks as our test ground and show +3.6 and +1.8 chrF improvement, respectively, over vanilla training. Furthermore, our approach is on-par or slightly better than other state-of-the-art fine-tuning techniques while being less sensitive to hyperparameter settings.
LGApr 24, 2025
Conformal Segmentation in Industrial Surface Defect Detection with Statistical GuaranteesCheng Shen, Yuewei Liu
In industrial settings, surface defects on steel can significantly compromise its service life and elevate potential safety risks. Traditional defect detection methods predominantly rely on manual inspection, which suffers from low efficiency and high costs. Although automated defect detection approaches based on Convolutional Neural Networks(e.g., Mask R-CNN) have advanced rapidly, their reliability remains challenged due to data annotation uncertainties during deep model training and overfitting issues. These limitations may lead to detection deviations when processing the given new test samples, rendering automated detection processes unreliable. To address this challenge, we first evaluate the detection model's practical performance through calibration data that satisfies the independent and identically distributed (i.i.d) condition with test data. Specifically, we define a loss function for each calibration sample to quantify detection error rates, such as the complement of recall rate and false discovery rate. Subsequently, we derive a statistically rigorous threshold based on a user-defined risk level to identify high-probability defective pixels in test images, thereby constructing prediction sets (e.g., defect regions). This methodology ensures that the expected error rate (mean error rate) on the test set remains strictly bounced by the predefined risk level. Additionally, we observe a negative correlation between the average prediction set size and the risk level on the test set, establishing a statistically rigorous metric for assessing detection model uncertainty. Furthermore, our study demonstrates robust and efficient control over the expected test set error rate across varying calibration-to-test partitioning ratios, validating the method's adaptability and operational effectiveness.
QMNov 8, 2021
Stain-free Detection of Embryo Polarization using Deep LearningCheng Shen, Adiyant Lamba, Meng Zhu et al.
Polarization of the mammalian embryo at the right developmental time is critical for its development to term and would be valuable in assessing the potential of human embryos. However, tracking polarization requires invasive fluorescence staining, impermissible in the in vitro fertilization clinic. Here, we report the use of artificial intelligence to detect polarization from unstained time-lapse movies of mouse embryos. We assembled a dataset of bright-field movie frames from 8-cell-stage embryos, side-by-side with corresponding images of fluorescent markers of cell polarization. We then used an ensemble learning model to detect whether any bright-field frame showed an embryo before or after onset of polarization. Our resulting model has an accuracy of 85% for detecting polarization, significantly outperforming human volunteers trained on the same data (61% accuracy). We discovered that our self-learning model focuses upon the angle between cells as one known cue for compaction, which precedes polarization, but it outperforms the use of this cue alone. By compressing three-dimensional time-lapsed image data into two-dimensions, we are able to reduce data to an easily manageable size for deep learning processing. In conclusion, we describe a method for detecting a key developmental feature of embryo development that avoids clinically impermissible fluorescence staining.
IVApr 15, 2021
BAM: A Balanced Attention Mechanism for Single Image Super ResolutionFanyi Wang, Haotian Hu, Cheng Shen
Recovering texture information from the aliasing regions has always been a major challenge for Single Image Super Resolution (SISR) task. These regions are often submerged in noise so that we have to restore texture details while suppressing noise. To address this issue, we propose a Balanced Attention Mechanism (BAM), which consists of Avgpool Channel Attention Module (ACAM) and Maxpool Spatial Attention Module (MSAM) in parallel. ACAM is designed to suppress extreme noise in the large scale feature maps while MSAM preserves high-frequency texture details. Thanks to the parallel structure, these two modules not only conduct self-optimization, but also mutual optimization to obtain the balance of noise reduction and high-frequency texture restoration during the back propagation process, and the parallel structure makes the inference faster. To verify the effectiveness and robustness of BAM, we applied it to 10 SOTA SISR networks. The results demonstrate that BAM can efficiently improve the networks performance, and for those originally with attention mechanism, the substitution with BAM further reduces the amount of parameters and increases the inference speed. Moreover, we present a dataset with rich texture aliasing regions in real scenes, named realSR7. Experiments prove that BAM achieves better super-resolution results on the aliasing area.
CRMar 19, 2021
An Experiment Study on Federated LearningTestbedCheng Shen, Wanli Xue
While the Internet of Things (IoT) can benefit from machine learning by outsourcing model training on the cloud, user data exposure to an untrusted cloud service provider can pose threat to user privacy. Recently, federated learning is proposed as an approach for privacy-preserving machine learning (PPML) for the IoT, while its practicability remains unclear. This work presents the evaluation on the efficiency and privacy performance of a readily available federated learning framework based on PySyft, a Python library for distributed deep learning. It is observed that the training speed of the framework is significantly slower than of the centralized approach due to communication overhead. Meanwhile, the framework bears some vulnerability to potential man-in-the-middle attacks at the network level. The report serves as a starting point for PPML performance analysis and suggests the future direction for PPML framework development.