CVApr 2
Rapidly deploying on-device eye tracking by distilling visual foundation modelsCheng Jiang, Jogendra Kundu, David Colmenares et al.
Eye tracking (ET) plays a critical role in augmented and virtual reality applications. However, rapidly deploying high-accuracy, on-device gaze estimation for new products remains challenging because hardware configurations (e.g., camera placement, camera pose, and illumination) often change across device generations. Visual foundation models (VFMs) are a promising direction for rapid training and deployment, and they excel on natural-image benchmarks; yet we find that off-the-shelf VFMs still struggle to achieve high accuracy on specialized near-eye infrared imagery. To address this gap, we introduce DistillGaze, a framework that distills a foundation model by leveraging labeled synthetic data and unlabeled real data for rapid and high-performance on-device gaze estimation. DistillGaze proceeds in two stages. First, we adapt a VFM into a domain-specialized teacher using self-supervised learning on labeled synthetic and unlabeled real images. Synthetic data provides scalable, high-quality gaze supervision, while unlabeled real data helps bridge the synthetic-to-real domain gap. Second, we train an on-device student using both teacher guidance and self-training. Evaluated on a large-scale, crowd-sourced dataset spanning over 2,000 participants, DistillGaze reduces median gaze error by 58.62% relative to synthetic-only baselines while maintaining a lightweight 256K-parameter model suitable for real-time on-device deployment. Overall, DistillGaze provides an efficient pathway for training and deploying ET models that adapt to hardware changes, and offers a recipe for combining synthetic supervision with unlabeled real data in on-device regression tasks.
OPTICSMar 12
Physics-Guided Inverse Design of Optical Waveforms for Nonlinear Electromagnetic DynamicsHao Zhang, Jack Hirschman, Randy Lemons et al.
Structured optical waveforms are emerging as powerful control fields for the next generation of complex photonic and electromagnetic systems, where the temporal structure of light can determine the ultimate performance of scientific instruments. However, identifying optimal optical drive fields in strongly nonlinear regimes remains challenging because the mapping between optical inputs and system response is high-dimensional and typically accessible only through computationally expensive simulations. Here, we present a physics-guided deep learning framework for the inverse design of optical temporal waveforms. By training a light-weighted surrogate model on simulations, the method enables gradient-based synthesis of optical profiles that compensate nonlinear field distortions in driven particle-field systems. As a representative application, we apply the approach to the generation of electron beams used in advanced photon and particle sources. The learned optical waveform actively suppresses extrinsic emittance growth by more than 52% compared with conventional Gaussian operation and by approximately 9% relative to the theoretical flattop limit in simulation. We further demonstrate experimental feasibility by synthesizing the predicted waveform using a programmable pulse-shaping platform; incorporating the measured optical profile into beamline simulations yields a 31% reduction in the extrinsic emittance contribution. Beyond accelerator applications, this work establishes a general way for physics-guided inverse design of optical control fields, enabling structured light to approach fundamental performance limits in nonlinear photonic and high-frequency electromagnetic systems.
LGNov 16, 2019
What Will Your Child Look Like? DNA-Net: Age and Gender Aware Kin Face SynthesizerPengyu Gao, Siyu Xia, Joseph Robinson et al.
Visual kinship recognition aims to identify blood relatives from facial images. Its practical application-- like in law-enforcement, video surveillance, automatic family album management, and more-- has motivated many researchers to put forth effort on the topic as of recent. In this paper, we focus on a new view of visual kinship technology: kin-based face generation. Specifically, we propose a two-stage kin-face generation model to predict the appearance of a child given a pair of parents. The first stage includes a deep generative adversarial autoencoder conditioned on ages and genders to map between facial appearance and high-level features. The second stage is our proposed DNA-Net, which serves as a transformation between the deep and genetic features based on a random selection process to fuse genes of a parent pair to form the genes of a child. We demonstrate the effectiveness of the proposed method quantitatively and qualitatively: quantitatively, pre-trained models and human subjects perform kinship verification on the generated images of children; qualitatively, we show photo-realistic face images of children that closely resemble the given pair of parents. In the end, experiments validate that the proposed model synthesizes convincing kin-faces using both subjective and objective standards.
CVApr 20, 2019
EV-Action: Electromyography-Vision Multi-Modal Action DatasetLichen Wang, Bin Sun, Joseph Robinson et al.
Multi-modal human action analysis is a critical and attractive research topic. However, the majority of the existing datasets only provide visual modalities (i.e., RGB, depth and skeleton). To make up this, we introduce a new, large-scale EV-Action dataset in this work, which consists of RGB, depth, electromyography (EMG), and two skeleton modalities. Compared with the conventional datasets, EV-Action dataset has two major improvements: (1) we deploy a motion capturing system to obtain high quality skeleton modality, which provides more comprehensive motion information including skeleton, trajectory, acceleration with higher accuracy, sampling frequency, and more skeleton markers. (2) we introduce an EMG modality which is usually used as an effective indicator in the biomechanics area, also it has yet to be well explored in motion related research. To the best of our knowledge, this is the first action dataset with EMG modality. The details of EV-Action dataset are clarified, meanwhile, a simple yet effective framework for EMG-based action recognition is proposed. Moreover, state-of-the-art baselines are applied to evaluate the effectiveness of all the modalities. The obtained result clearly shows the validity of EMG modality in human action analysis tasks. We hope this dataset can make significant contributions to human motion analysis, computer vision, machine learning, biomechanics, and other interdisciplinary fields.