CLAug 22, 2023
Towards an On-device Agent for Text RewritingYun Zhu, Yinxiao Liu, Felix Stahlberg et al. · deepmind
Large Language Models (LLMs) have demonstrated impressive capabilities for text rewriting. Nonetheless, the large sizes of these models make them impractical for on-device inference, which would otherwise allow for enhanced privacy and economical inference. Creating a smaller yet potent language model for text rewriting presents a formidable challenge because it requires balancing the need for a small size with the need to retain the emergent capabilities of the LLM, that requires costly data collection. To address the above challenge, we introduce a new instruction tuning approach for building a mobile-centric text rewriting model. Our strategies enable the generation of high quality training data without any human labeling. In addition, we propose a heuristic reinforcement learning framework which substantially enhances performance without requiring preference data. To further bridge the performance gap with the larger server-side model, we propose an effective approach that combines the mobile rewrite agent with the server model using a cascade. To tailor the text rewriting tasks to mobile scenarios, we introduce MessageRewriteEval, a benchmark that focuses on text rewriting for messages through natural language instructions. Through empirical experiments, we demonstrate that our on-device model surpasses the current state-of-the-art LLMs in text rewriting while maintaining a significantly reduced model size. Notably, we show that our proposed cascading approach improves model performance.
CVApr 21, 2023
Speed Is All You Need: On-Device Acceleration of Large Diffusion Models via GPU-Aware OptimizationsYu-Hui Chen, Raman Sarokin, Juhyun Lee et al.
The rapid development and application of foundation models have revolutionized the field of artificial intelligence. Large diffusion models have gained significant attention for their ability to generate photorealistic images and support various tasks. On-device deployment of these models provides benefits such as lower server costs, offline functionality, and improved user privacy. However, common large diffusion models have over 1 billion parameters and pose challenges due to restricted computational and memory resources on devices. We present a series of implementation optimizations for large diffusion models that achieve the fastest reported inference latency to-date (under 12 seconds for Stable Diffusion 1.4 without int8 quantization on Samsung S23 Ultra for a 512x512 image with 20 iterations) on GPU-equipped mobile devices. These enhancements broaden the applicability of generative AI and improve the overall user experience across a wide range of devices.
CLMar 13, 2024
Gemma: Open Models Based on Gemini Research and TechnologyGemma Team, Thomas Mesnard, Cassidy Hardin et al. · deepmind
This work introduces Gemma, a family of lightweight, state-of-the art open models built from the research and technology used to create Gemini models. Gemma models demonstrate strong performance across academic benchmarks for language understanding, reasoning, and safety. We release two sizes of models (2 billion and 7 billion parameters), and provide both pretrained and fine-tuned checkpoints. Gemma outperforms similarly sized open models on 11 out of 18 text-based tasks, and we present comprehensive evaluations of safety and responsibility aspects of the models, alongside a detailed description of model development. We believe the responsible release of LLMs is critical for improving the safety of frontier models, and for enabling the next wave of LLM innovations.
CVNov 11, 2015
Multimodal MRI Neuroimaging with Motion Compensation Based on Particle FilteringYu-Hui Chen, Roni Mittelman, Boklye Kim et al.
Head movement during scanning impedes activation detection in fMRI studies. Head motion in fMRI acquired using slice-based Echo Planar Imaging (EPI) can be estimated and compensated by aligning the images onto a reference volume through image registration. However, registering EPI images volume to volume fails to consider head motion between slices, which may lead to severely biased head motion estimates. Slice-to-volume registration can be used to estimate motion parameters for each slice by more accurately representing the image acquisition sequence. However, accurate slice to volume mapping is dependent on the information content of the slices: middle slices are information rich, while edge slides are information poor and more prone to distortion. In this work, we propose a Gaussian particle filter based head motion tracking algorithm to reduce the image misregistration errors. The algorithm uses a dynamic state space model of head motion with an observation equation that models continuous slice acquisition of the scanner. Under this model the particle filter provides more accurate motion estimates and voxel position estimates. We demonstrate significant performance improvement of the proposed approach as compared to registration-only methods of head motion estimation and brain activation detection.
MLMar 15, 2015
Statistical Estimation and Clustering of Group-invariant Orientation ParametersYu-Hui Chen, Dennis Wei, Gregory Newstadt et al.
We treat the problem of estimation of orientation parameters whose values are invariant to transformations from a spherical symmetry group. Previous work has shown that any such group-invariant distribution must satisfy a restricted finite mixture representation, which allows the orientation parameter to be estimated using an Expectation Maximization (EM) maximum likelihood (ML) estimation algorithm. In this paper, we introduce two parametric models for this spherical symmetry group estimation problem: 1) the hyperbolic Von Mises Fisher (VMF) mixture distribution and 2) the Watson mixture distribution. We also introduce a new EM-ML algorithm for clustering samples that come from mixtures of group-invariant distributions with different parameters. We apply the models to the problem of mean crystal orientation estimation under the spherically symmetric group associated with the crystal form, e.g., cubic or octahedral or hexahedral. Simulations and experiments establish the advantages of the extended EM-VMF and EM-Watson estimators for data acquired by Electron Backscatter Diffraction (EBSD) microscopy of a polycrystalline Nickel alloy sample.
CVFeb 26, 2015
A Dictionary Approach to EBSD IndexingYu-Hui Chen, Se Un Park, Dennis Wei et al.
We propose a framework for indexing of grain and sub-grain structures in electron backscatter diffraction (EBSD) images of polycrystalline materials. The framework is based on a previously introduced physics-based forward model by Callahan and De Graef (2013) relating measured patterns to grain orientations (Euler angle). The forward model is tuned to the microscope and the sample symmetry group. We discretize the domain of the forward model onto a dense grid of Euler angles and for each measured pattern we identify the most similar patterns in the dictionary. These patterns are used to identify boundaries, detect anomalies, and index crystal orientations. The statistical distribution of these closest matches is used in an unsupervised binary decision tree (DT) classifier to identify grain boundaries and anomalous regions. The DT classifies a pattern as an anomaly if it has an abnormally low similarity to any pattern in the dictionary. It classifies a pixel as being near a grain boundary if the highly ranked patterns in the dictionary differ significantly over the pixels 3x3 neighborhood. Indexing is accomplished by computing the mean orientation of the closest dictionary matches to each pattern. The mean orientation is estimated using a maximum likelihood approach that models the orientation distribution as a mixture of Von Mises-Fisher distributions over the quaternionic 3-sphere. The proposed dictionary matching approach permits segmentation, anomaly detection, and indexing to be performed in a unified manner with the additional benefit of uncertainty quantification. We demonstrate the proposed dictionary-based approach on a Ni-base IN100 alloy.
CVFeb 26, 2015
Coercive Region-level Registration for Multi-modal ImagesYu-Hui Chen, Dennis Wei, Gregory Newstadt et al.
We propose a coercive approach to simultaneously register and segment multi-modal images which share similar spatial structure. Registration is done at the region level to facilitate data fusion while avoiding the need for interpolation. The algorithm performs alternating minimization of an objective function informed by statistical models for pixel values in different modalities. Hypothesis tests are developed to determine whether to refine segmentations by splitting regions. We demonstrate that our approach has significantly better performance than the state-of-the-art registration and segmentation methods on microscopy images.
MLNov 10, 2014
Parameter estimation in spherical symmetry groupsYu-Hui Chen, Dennis Wei, Gregory Newstadt et al.
This paper considers statistical estimation problems where the probability distribution of the observed random variable is invariant with respect to actions of a finite topological group. It is shown that any such distribution must satisfy a restricted finite mixture representation. When specialized to the case of distributions over the sphere that are invariant to the actions of a finite spherical symmetry group $\mathcal G$, a group-invariant extension of the Von Mises Fisher (VMF) distribution is obtained. The $\mathcal G$-invariant VMF is parameterized by location and scale parameters that specify the distribution's mean orientation and its concentration about the mean, respectively. Using the restricted finite mixture representation these parameters can be estimated using an Expectation Maximization (EM) maximum likelihood (ML) estimation algorithm. This is illustrated for the problem of mean crystal orientation estimation under the spherically symmetric group associated with the crystal form, e.g., cubic or octahedral or hexahedral. Simulations and experiments establish the advantages of the extended VMF EM-ML estimator for data acquired by Electron Backscatter Diffraction (EBSD) microscopy of a polycrystalline Nickel alloy sample.