CLJun 6, 2024
Proofread: Fixes All Errors with One TapRenjie Liu, Yanxiang Zhang, Yun Zhu et al.
The impressive capabilities in Large Language Models (LLMs) provide a powerful approach to reimagine users' typing experience. This paper demonstrates Proofread, a novel Gboard feature powered by a server-side LLM in Gboard, enabling seamless sentence-level and paragraph-level corrections with a single tap. We describe the complete system in this paper, from data generation, metrics design to model tuning and deployment. To obtain models with sufficient quality, we implement a careful data synthetic pipeline tailored to online use cases, design multifaceted metrics, employ a two-stage tuning approach to acquire the dedicated LLM for the feature: the Supervised Fine Tuning (SFT) for foundational quality, followed by the Reinforcement Learning (RL) tuning approach for targeted refinement. Specifically, we find sequential tuning on Rewrite and proofread tasks yields the best quality in SFT stage, and propose global and direct rewards in the RL tuning stage to seek further improvement. Extensive experiments on a human-labeled golden set showed our tuned PaLM2-XS model achieved 85.56\% good ratio. We launched the feature to Pixel 8 devices by serving the model on TPU v5 in Google Cloud, with thousands of daily active users. Serving latency was significantly reduced by quantization, bucket inference, text segmentation, and speculative decoding. Our demo could be seen in \href{https://youtu.be/4ZdcuiwFU7I}{Youtube}.
HCFeb 18, 2021
TapNet: The Design, Training, Implementation, and Applications of a Multi-Task Learning CNN for Off-Screen Mobile InputMichael Xuelin Huang, Yang Li, Nazneen Nazneen et al.
To make off-screen interaction without specialized hardware practical, we investigate using deep learning methods to process the common built-in IMU sensor (accelerometers and gyroscopes) on mobile phones into a useful set of one-handed interaction events. We present the design, training, implementation and applications of TapNet, a multi-task network that detects tapping on the smartphone. With phone form factor as auxiliary information, TapNet can jointly learn from data across devices and simultaneously recognize multiple tap properties, including tap direction and tap location. We developed two datasets consisting of over 135K training samples, 38K testing samples, and 32 participants in total. Experimental evaluation demonstrated the effectiveness of the TapNet design and its significant improvement over the state of the art. Along with the datasets, (https://sites.google.com/site/michaelxlhuang/datasets/tapnet-dataset), and extensive experiments, TapNet establishes a new technical foundation for off-screen mobile input.
HCMar 10, 2019
SacCalib: Reducing Calibration Distortion for Stationary Eye Trackers Using Saccadic Eye MovementsMichael Xuelin Huang, Andreas Bulling
Recent methods to automatically calibrate stationary eye trackers were shown to effectively reduce inherent calibration distortion. However, these methods require additional information, such as mouse clicks or on-screen content. We propose the first method that only requires users' eye movements to reduce calibration distortion in the background while users naturally look at an interface. Our method exploits that calibration distortion makes straight saccade trajectories appear curved between the saccadic start and end points. We show that this curving effect is systematic and the result of distorted gaze projection plane. To mitigate calibration distortion, our method undistorts this plane by straightening saccade trajectories using image warping. We show that this approach improves over the common six-point calibration and is promising for reducing distortion. As such, it provides a non-intrusive solution to alleviating accuracy decrease of eye tracker during long-term use.
HCDec 19, 2018
Privacy-Aware Eye Tracking Using Differential PrivacyJulian Steil, Inken Hagestedt, Michael Xuelin Huang et al.
With eye tracking being increasingly integrated into virtual and augmented reality (VR/AR) head-mounted displays, preserving users' privacy is an ever more important, yet under-explored, topic in the eye tracking community. We report a large-scale online survey (N=124) on privacy aspects of eye tracking that provides the first comprehensive account of with whom, for which services, and to what extent users are willing to share their gaze data. Using these insights, we design a privacy-aware VR interface that uses differential privacy, which we evaluate on a new 20-participant dataset for two privacy sensitive tasks: We show that our method can prevent user re-identification and protect gender information while maintaining high performance for gaze-based document type classification. Our results highlight the privacy challenges particular to gaze data and demonstrate that differential privacy is a potential means to address them. Thus, this paper lays important foundations for future research on privacy-aware gaze interfaces.