Yarong Mu

CR
3papers
6citations
Novelty45%
AI Score36

3 Papers

LGOct 4, 2022
Recycling Scraps: Improving Private Learning by Leveraging Intermediate Checkpoints

Virat Shejwalkar, Arun Ganesh, Rajiv Mathews et al. · cmu

In this work, we focus on improving the accuracy-variance trade-off for state-of-the-art differentially private machine learning (DP ML) methods. First, we design a general framework that uses aggregates of intermediate checkpoints \emph{during training} to increase the accuracy of DP ML techniques. Specifically, we demonstrate that training over aggregates can provide significant gains in prediction accuracy over the existing state-of-the-art for StackOverflow, CIFAR10 and CIFAR100 datasets. For instance, we improve the state-of-the-art DP StackOverflow accuracies to 22.74\% (+2.06\% relative) for $ε=8.2$, and 23.90\% (+2.09\%) for $ε=18.9$. Furthermore, these gains magnify in settings with periodically varying training data distributions. We also demonstrate that our methods achieve relative improvements of 0.54\% and 62.6\% in terms of utility and variance, on a proprietary, production-grade pCVR task. Lastly, we initiate an exploration into estimating the uncertainty (variance) that DP noise adds in the predictions of DP ML models. We prove that, under standard assumptions on the loss function, the sample variance from last few checkpoints provides a good approximation of the variance of the final model of a DP run. Empirically, we show that the last few checkpoints can provide a reasonable lower bound for the variance of a converged DP model. Crucially, all the methods proposed in this paper operate on \emph{a single training run} of the DP ML technique, thus incurring no additional privacy cost.

PFApr 16
Ragged Paged Attention: A High-Performance and Flexible LLM Inference Kernel for TPU

Jevin Jiang, Ying Chen, Blake A. Hechtman et al.

Large Language Model (LLM) deployment is increasingly shifting to cost-efficient accelerators like Google's Tensor Processing Units (TPUs), prioritizing both performance and total cost of ownership (TCO). However, existing LLM inference kernels and serving systems remain largely GPU-centric, and there is no well-established approach for efficiently mapping LLM workloads onto TPU architectures--particularly under the dynamic and ragged execution patterns common in modern serving. In this paper, we present Ragged Paged Attention (RPA), a high-performance and flexible attention kernel for TPUs, implemented using Pallas and Mosaic. RPA addresses these challenges through three key techniques: (1) fine-grained tiling to enable efficient dynamic slicing over ragged memory, (2) a custom software pipeline that fuses KV cache updates with attention computation, and (3) a distribution-aware compilation strategy that generates specialized kernels for decode, prefill, and mixed workloads. Evaluated on Llama 3 8B on TPU7x, RPA achieves up to 86% memory bandwidth utilization (MBU) in decode and 73% model FLOPs utilization (MFU) in prefill. Integrated as the primary TPU backend in vLLM and SGLang, RPA provides a production-grade foundation for efficient TPU inference and offers practical insights into kernel design.

CRJul 26, 2018
Topological Graphic Passwords And Their Matchings Towards Cryptography

Bing Yao, Hui Sun, Xiaohui Zhang et al.

Graphical passwords (GPWs) are convenient for mobile equipments with touch screen. Topological graphic passwords (Topsnut-gpws) can be saved in computer by classical matrices and run quickly than the existing GPWs. We research Topsnut-gpws by the matching of view, since they have many advantages. We discuss: configuration matching partition, coloring/labelling matching partition, set matching partition, matching chain, etc. And, we introduce new graph labellings for enriching Topsnut-matchings and show that these labellings can be realized for trees or spanning trees of networks. In theoretical works we explore Graph Labelling Analysis, and show that every graph admits our extremal labellings and set-type labellings in graph theory. Many of the graph labellings mentioned are related with problems of set matching partitions to number theory, and yield new objects and new problems to graph theory.