Tyler Johnson

LG
h-index27
4papers
102citations
Novelty54%
AI Score35

4 Papers

LGJan 30, 2023
Probabilistic Neural Data Fusion for Learning from an Arbitrary Number of Multi-fidelity Data Sets

Carlos Mora, Jonathan Tammer Eweis-Labolle, Tyler Johnson et al.

In many applications in engineering and sciences analysts have simultaneous access to multiple data sources. In such cases, the overall cost of acquiring information can be reduced via data fusion or multi-fidelity (MF) modeling where one leverages inexpensive low-fidelity (LF) sources to reduce the reliance on expensive high-fidelity (HF) data. In this paper, we employ neural networks (NNs) for data fusion in scenarios where data is very scarce and obtained from an arbitrary number of sources with varying levels of fidelity and cost. We introduce a unique NN architecture that converts MF modeling into a nonlinear manifold learning problem. Our NN architecture inversely learns non-trivial (e.g., non-additive and non-hierarchical) biases of the LF sources in an interpretable and visualizable manifold where each data source is encoded via a low-dimensional distribution. This probabilistic manifold quantifies model form uncertainties such that LF sources with small bias are encoded close to the HF source. Additionally, we endow the output of our NN with a parametric distribution not only to quantify aleatoric uncertainties, but also to reformulate the network's loss function based on strictly proper scoring rules which improve robustness and accuracy on unseen HF data. Through a set of analytic and engineering examples, we demonstrate that our approach provides a high predictive power while quantifying various sources uncertainties.

LGJul 17, 2025
Apple Intelligence Foundation Language Models: Tech Report 2025

Ethan Li, Anders Boesen Lindbo Larsen, Chen Zhang et al. · apple-ml, cmu

We introduce two multilingual, multimodal foundation language models that power Apple Intelligence features across Apple devices and services: i a 3B-parameter on-device model optimized for Apple silicon through architectural innovations such as KV-cache sharing and 2-bit quantization-aware training; and ii a scalable server model built on a novel Parallel-Track Mixture-of-Experts PT-MoE transformer that combines track parallelism, mixture-of-experts sparse computation, and interleaved global-local attention to deliver high quality with competitive cost on Apple's Private Cloud Compute platform. Both models are trained on large-scale multilingual and multimodal datasets sourced via responsible web crawling, licensed corpora, and high-quality synthetic data, then further refined with supervised fine-tuning and reinforcement learning on a new asynchronous platform. The resulting models support several additional languages while understanding images and executing tool calls. In public benchmarks and human evaluations, both the server model and the on-device model match or surpass comparably sized open baselines. A new Swift-centric Foundation Models framework exposes guided generation, constrained tool calling, and LoRA adapter fine-tuning, allowing developers to integrate these capabilities with a few lines of code. The latest advancements in Apple Intelligence models are grounded in our Responsible AI approach with safeguards like content filtering and locale-specific evaluation, as well as our commitment to protecting our users' privacy with innovations like Private Cloud Compute.

AINov 12, 2024
Towards Low-bit Communication for Tensor Parallel LLM Inference

Harry Dong, Tyler Johnson, Minsik Cho et al.

Tensor parallelism provides an effective way to increase server large language model (LLM) inference efficiency despite adding an additional communication cost. However, as server LLMs continue to scale in size, they will need to be distributed across more devices, magnifying the communication cost. One way to approach this problem is with quantization, but current methods for LLMs tend to avoid quantizing the features that tensor parallelism needs to communicate. Taking advantage of consistent outliers in communicated features, we introduce a quantization method that reduces communicated values on average from 16 bits to 4.2 bits while preserving nearly all of the original performance. For instance, our method maintains around 98.0% and 99.5% of Gemma 2 27B's and Llama 2 13B's original performance, respectively, averaged across all tasks we evaluated on.

LGMay 14, 2025
Should We Simultaneously Calibrate Multiple Computer Models?

Jonathan Tammer Eweis-Labolle, Tyler Johnson, Xiangyu Sun et al.

In an increasing number of applications designers have access to multiple computer models which typically have different levels of fidelity and cost. Traditionally, designers calibrate these models one at a time against some high-fidelity data (e.g., experiments). In this paper, we question this tradition and assess the potential of calibrating multiple computer models at the same time. To this end, we develop a probabilistic framework that is founded on customized neural networks (NNs) that are designed to calibrate an arbitrary number of computer models. In our approach, we (1) consider the fact that most computer models are multi-response and that the number and nature of calibration parameters may change across the models, and (2) learn a unique probability distribution for each calibration parameter of each computer model, (3) develop a loss function that enables our NN to emulate all data sources while calibrating the computer models, and (4) aim to learn a visualizable latent space where model-form errors can be identified. We test the performance of our approach on analytic and engineering problems to understand the potential advantages and pitfalls in simultaneous calibration of multiple computer models. Our method can improve predictive accuracy, however, it is prone to non-identifiability issues in higher-dimensional input spaces that are normally constrained by underlying physics.