OPTICSSep 17, 2023
All-optical image denoising using a diffractive visual processorCagatay Isıl, Tianyi Gan, F. Onuralp Ardic et al.
Image denoising, one of the essential inverse problems, targets to remove noise/artifacts from input images. In general, digital image denoising algorithms, executed on computers, present latency due to several iterations implemented in, e.g., graphics processing units (GPUs). While deep learning-enabled methods can operate non-iteratively, they also introduce latency and impose a significant computational burden, leading to increased power consumption. Here, we introduce an analog diffractive image denoiser to all-optically and non-iteratively clean various forms of noise and artifacts from input images - implemented at the speed of light propagation within a thin diffractive visual processor. This all-optical image denoiser comprises passive transmissive layers optimized using deep learning to physically scatter the optical modes that represent various noise features, causing them to miss the output image Field-of-View (FoV) while retaining the object features of interest. Our results show that these diffractive denoisers can efficiently remove salt and pepper noise and image rendering-related spatial artifacts from input phase or intensity images while achieving an output power efficiency of ~30-40%. We experimentally demonstrated the effectiveness of this analog denoiser architecture using a 3D-printed diffractive visual processor operating at the terahertz spectrum. Owing to their speed, power-efficiency, and minimal computational overhead, all-optical diffractive denoisers can be transformative for various image display and projection systems, including, e.g., holographic displays.
CVMay 14, 2022
Realistic Defocus Blur for Multiplane Computer-Generated HolographyKoray Kavaklı, Yuta Itoh, Hakan Urey et al.
This paper introduces a new multiplane CGH computation method to reconstruct artefact-free high-quality holograms with natural-looking defocus blur. Our method introduces a new targeting scheme and a new loss function. While the targeting scheme accounts for defocused parts of the scene at each depth plane, the new loss function analyzes focused and defocused parts separately in reconstructed images. Our method support phase-only CGH calculations using various iterative (e.g., Gerchberg-Saxton, Gradient Descent) and non-iterative (e.g., Double Phase) CGH techniques. We achieve our best image quality using a modified gradient descent-based optimization recipe where we introduce a constraint inspired by the double phase method. We validate our method experimentally using our proof-of-concept holographic display, comparing various algorithms, including multi-depth scenes with sparse and dense contents.
CVMar 8, 2022
Unrolled Primal-Dual Networks for Lensless CamerasOliver Kingshott, Nick Antipa, Emrah Bostan et al.
Conventional image reconstruction models for lensless cameras often assume that each measurement results from convolving a given scene with a single experimentally measured point-spread function. These image reconstruction models fall short in simulating lensless cameras truthfully as these models are not sophisticated enough to account for optical aberrations or scenes with depth variations. Our work shows that learning a supervised primal-dual reconstruction method results in image quality matching state of the art in the literature without demanding a large network capacity. This improvement stems from our primary finding that embedding learnable forward and adjoint models in a learned primal-dual optimization framework can even improve the quality of reconstructed images (+5dB PSNR) compared to works that do not correct for the model error. In addition, we built a proof-of-concept lensless camera prototype that uses a pseudo-random phase mask to demonstrate our point. Finally, we share the extensive evaluation of our learned model based on an open dataset and a dataset from our proof-of-concept lensless camera prototype.
HCDec 8, 2022
ChromaCorrect: Prescription Correction in Virtual Reality Headsets through Perceptual GuidanceAhmet Güzel, Jeanne Beyazian, Praneeth Chakravarthula et al.
A large portion of today's world population suffer from vision impairments and wear prescription eyeglasses. However, eyeglasses causes additional bulk and discomfort when used with augmented and virtual reality headsets, thereby negatively impacting the viewer's visual experience. In this work, we remedy the usage of prescription eyeglasses in Virtual Reality (VR) headsets by shifting the optical complexity completely into software and propose a prescription-aware rendering approach for providing sharper and immersive VR imagery. To this end, we develop a differentiable display and visual perception model encapsulating display-specific parameters, color and visual acuity of human visual system and the user-specific refractive errors. Using this differentiable visual perception model, we optimize the rendered imagery in the display using stochastic gradient-descent solvers. This way, we provide prescription glasses-free sharper images for a person with vision impairments. We evaluate our approach on various displays, including desktops and VR headsets, and show significant quality and contrast improvements for users with vision impairments.
CVJul 31, 2024
Learned Single-Pass Multitasking Perceptual Graphics for Immersive DisplaysDoğa Yılmaz, He Wang, Towaki Takikawa et al.
Emerging immersive display technologies efficiently utilize resources with perceptual graphics methods such as foveated rendering and denoising. Running multiple perceptual graphics methods challenges devices with limited power and computational resources. We propose a computationally-lightweight learned multitasking perceptual graphics model. Given RGB images and text-prompts, our model performs text-described perceptual tasks in a single inference step. Simply daisy-chaining multiple models or training dedicated models can lead to model management issues and exhaust computational resources. In contrast, our flexible method unlocks consistent high quality perceptual effects with reasonable compute, supporting various permutations at varied intensities using adjectives in text prompts (e.g. mildly, lightly). Text-guidance provides ease of use for dynamic requirements such as creative processes. To train our model, we propose a dataset containing source and perceptually enhanced images with corresponding text prompts. We evaluate our model on desktop and embedded platforms and validate perceptual quality through a user study.
CVNov 1, 2025
Text-guided Fine-Grained Video Anomaly DetectionJihao Gu, Kun Li, He Wang et al.
Video Anomaly Detection (VAD) aims to identify anomalous events within video segments. In scenarios such as surveillance or industrial process monitoring, anomaly detection is of critical importance. While existing approaches are semi-automated, requiring human assessment for anomaly detection, traditional VADs offer limited output as either normal or anomalous. We propose Text-guided Fine-Grained Video Anomaly Detection (T-VAD), a framework built upon Large Vision-Language Model (LVLM). T-VAD introduces an Anomaly Heatmap Decoder (AHD) that performs pixel-wise visual-textual feature alignment to generate fine-grained anomaly heatmaps. Furthermore, we design a Region-aware Anomaly Encoder (RAE) that transforms the heatmaps into learnable textual embeddings, guiding the LVLM to accurately identify and localize anomalous events in videos. This significantly enhances both the granularity and interactivity of anomaly detection. The proposed method achieving SOTA performance by demonstrating 94.8% Area Under the Curve (AUC, specifically micro-AUC) and 67.8%/76.7% accuracy in anomaly heatmaps (RBDC/TBDC) on the UBnormal dataset, and subjectively verified more preferable textual description on the ShanghaiTech-based dataset (BLEU-4: 62.67 for targets, 88.84 for trajectories; Yes/No accuracy: 97.67%), and on the UBnormal dataset (BLEU-4: 50.32 for targets, 78.10 for trajectories; Yes/No accuracy: 89.73%).
GRMar 10
Prompt-Driven Color Accessibility Evaluation in Diffusion-based Image Generation ModelsXinyao Zhuang, Jose Echevarria, Kaan Akşit
Generative models are increasingly integrated into creative workflows. While text-to-image generation excels in visual quality and diversity, color accessibility for users with Color Vision Deficiencies (CVD) remains largely unexplored. Our work systematically evaluates color accessibility in images generated by a common pretrained diffusion model, prompted to improve accessibility across diverse categories. We quantify performance using established, off-the-shelf CVD simulation methods and introduce "CVDLoss", a new metric measuring differences in image gradients indicative of structural detail. We validate CVDLoss against a commonly used daltonization method, demonstrating its sensitivity to color accessibility modifications. Applying CVDLoss to model outputs reveals that existing diffusion models struggle to reliably respond to accessibility-focused prompts. Consequently, our study establishes CVDLoss as a valuable evaluation tool for accessibility-aware image generation and post-processing, offering insights into current generative models' limitations in addressing color accessibility.
GRJun 10, 2025
Complex-Valued Holographic Radiance FieldsYicheng Zhan, Dong-Ha Shin, Seung-Hwan Baek et al.
Modeling the full properties of light, including both amplitude and phase, in 3D representations is crucial for advancing physically plausible rendering, particularly in holographic displays. To support these features, we propose a novel representation that optimizes 3D scenes without relying on intensity-based intermediaries. We reformulate 3D Gaussian splatting with complex-valued Gaussian primitives, expanding support for rendering with light waves. By leveraging RGBD multi-view images, our method directly optimizes complex-valued Gaussians as a 3D holographic scene representation. This eliminates the need for computationally expensive hologram re-optimization. Compared with state-of-the-art methods, our method achieves 30x-10,000x speed improvements while maintaining on-par image quality, representing a first step towards geometrically aligned, physically plausible holographic scene representations.
CVMar 24, 2024
Configurable Holography: Towards Display and Scene AdaptationYicheng Zhan, Liang Shi, Wojciech Matusik et al.
Emerging learned holography approaches have enabled faster and high-quality hologram synthesis, setting a new milestone toward practical holographic displays. However, these learned models require training a dedicated model for each set of display-scene parameters. To address this shortcoming, our work introduces a highly configurable learned model structure, synthesizing 3D holograms interactively while supporting diverse display-scene parameters. Our family of models relying on this structure can be conditioned continuously for varying novel scene parameters, including input images, propagation distances, volume depths, peak brightnesses, and novel display parameters of pixel pitches and wavelengths. Uniquely, our findings unearth a correlation between depth estimation and hologram synthesis tasks in the learning domain, leading to a learned model that unlocks accurate 3D hologram generation from 2D images across varied display-scene parameters. We validate our models by synthesizing high-quality 3D holograms in simulations and also verify our findings with two different holographic display prototypes. Moreover, our family of models can synthesize holograms with a 2x speed-up compared to the state-of-the-art learned holography approaches in the literature.
CVNov 19, 2025
Complex-Valued 2D Gaussian Representation for Computer-Generated HolographyYicheng Zhan, Xiangjun Gao, Long Quan et al.
We propose a new hologram representation based on structured complex-valued 2D Gaussian primitives, which replaces per-pixel information storage and reduces the parameter search space by up to 10:1. To enable end-to-end training, we develop a differentiable rasterizer for our representation, integrated with a GPU-optimized light propagation kernel in free space. Our extensive experiments show that our method achieves up to 2.5x lower VRAM usage and 50% faster optimization while producing higher-fidelity reconstructions than existing methods. We further introduce a conversion procedure that adapts our representation to practical hologram formats, including smooth and random phase-only holograms. Our experiments show that this procedure can effectively suppress noise artifacts observed in previous methods. By reducing the hologram parameter search space, our representation enables a more scalable hologram estimation in the next-generation computer-generated holography systems.
CVOct 15, 2025
Foveation Improves Payload Capacity in SteganographyLifeng Qiu Lin, Henry Kam, Qi Sun et al.
Steganography finds its use in visual medium such as providing metadata and watermarking. With support of efficient latent representations and foveated rendering, we trained models that improve existing capacity limits from 100 to 500 bits, while achieving better accuracy of up to 1 failure bit out of 2000, at 200K test bits. Finally, we achieve a comparable visual quality of 31.47 dB PSNR and 0.13 LPIPS, showing the effectiveness of novel perceptual design in creating multi-modal latent representations in steganography.
CVOct 2, 2025
Learned Display Radiance Fields with Lensless CamerasZiyang Chen, Yuta Itoh, Kaan Akşit
Calibrating displays is a basic and regular task that content creators must perform to maintain optimal visual experience, yet it remains a troublesome issue. Measuring display characteristics from different viewpoints often requires specialized equipment and a dark room, making it inaccessible to most users. To avoid specialized hardware requirements in display calibrations, our work co-designs a lensless camera and an Implicit Neural Representation based algorithm for capturing display characteristics from various viewpoints. More specifically, our pipeline enables efficient reconstruction of light fields emitted from a display from a viewing cone of 46.6° X 37.6°. Our emerging pipeline paves the initial steps towards effortless display calibration and characterization.
CVSep 29, 2025
Editing Physiological Signals in Videos Using Latent RepresentationsTianwen Zhou, Akshay Paruchuri, Josef Spjut et al. · stanford
Camera-based physiological signal estimation provides a non-contact and convenient means to monitor Heart Rate (HR). However, the presence of vital signals in facial videos raises significant privacy concerns, as they can reveal sensitive personal information related to the health and emotional states of an individual. To address this, we propose a learned framework that edits physiological signals in videos while preserving visual fidelity. First, we encode an input video into a latent space via a pretrained 3D Variational Autoencoder (3D VAE), while a target HR prompt is embedded through a frozen text encoder. We fuse them using a set of trainable spatio-temporal layers with Adaptive Layer Normalizations (AdaLN) to capture the strong temporal coherence of remote Photoplethysmography (rPPG) signals. We apply Feature-wise Linear Modulation (FiLM) in the decoder with a fine-tuned output layer to avoid the degradation of physiological signals during reconstruction, enabling accurate physiological modulation in the reconstructed video. Empirical results show that our method preserves visual quality with an average PSNR of 38.96 dB and SSIM of 0.98 on selected datasets, while achieving an average HR modulation error of 10.00 bpm MAE and 10.09% MAPE using a state-of-the-art rPPG estimator. Our design's controllable HR editing is useful for applications such as anonymizing biometric signals in real videos or synthesizing realistic videos with desired vital signs.
LGJul 28, 2025
Efficient Proxy Raytracer for Optical Systems using Implicit Neural RepresentationsShiva Sinaei, Chuanjun Zheng, Kaan Akşit et al.
Ray tracing is a widely used technique for modeling optical systems, involving sequential surface-by-surface computations, which can be computationally intensive. We propose Ray2Ray, a novel method that leverages implicit neural representations to model optical systems with greater efficiency, eliminating the need for surface-by-surface computations in a single pass end-to-end model. Ray2Ray learns the mapping between rays emitted from a given source and their corresponding rays after passing through a given optical system in a physically accurate manner. We train Ray2Ray on nine off-the-shelf optical systems, achieving positional errors on the order of 1μm and angular deviations on the order 0.01 degrees in the estimated output rays. Our work highlights the potential of neural representations as a proxy for optical raytracer.
CVMay 2, 2023
AutoColor: Learned Light Power Control for Multi-Color HologramsYicheng Zhan, Koray Kavaklı, Hakan Urey et al.
Multi-color holograms rely on simultaneous illumination from multiple light sources. These multi-color holograms could utilize light sources better than conventional single-color holograms and can improve the dynamic range of holographic displays. In this letter, we introduce AutoColor , the first learned method for estimating the optimal light source powers required for illuminating multi-color holograms. For this purpose, we establish the first multi-color hologram dataset using synthetic images and their depth information. We generate these synthetic images using a trending pipeline combining generative, large language, and monocular depth estimation models. Finally, we train our learned model using our dataset and experimentally demonstrate that AutoColor significantly decreases the number of steps required to optimize multi-color holograms from > 1000 to 70 iteration steps without compromising image quality.
OPTICSAug 1, 2021
Learned holographic light transportKoray Kavaklı, Hakan Urey, Kaan Akşit
Computer-Generated Holography (CGH) algorithms often fall short in matching simulations with results from a physical holographic display. Our work addresses this mismatch by learning the holographic light transport in holographic displays. Using a camera and a holographic display, we capture the image reconstructions of optimized holograms that rely on ideal simulations to generate a dataset. Inspired by the ideal simulations, we learn a complex-valued convolution kernel that can propagate given holograms to captured photographs in our dataset. Our method can dramatically improve simulation accuracy and image quality in holographic displays while paving the way for physically informed learning approaches.
HCJul 7, 2021
Telelife: The Future of Remote LivingJason Orlosky, Misha Sra, Kenan Bektaş et al.
In recent years, everyday activities such as work and socialization have steadily shifted to more remote and virtual settings. With the COVID-19 pandemic, the switch from physical to virtual has been accelerated, which has substantially affected various aspects of our lives, including business, education, commerce, healthcare, and personal life. This rapid and large-scale switch from in-person to remote interactions has revealed that our current technologies lack functionality and are limited in their ability to recreate interpersonal interactions. To help address these limitations in the future, we introduce "Telelife," a vision for the near future that depicts the potential means to improve remote living better aligned with how we interact, live and work in the physical world. Telelife encompasses novel synergies of technologies and concepts such as digital twins, virtual prototyping, and attention and context-aware user interfaces with innovative hardware that can support ultrarealistic graphics, user state detection, and more. These ideas will guide the transformation of our daily lives and routines soon, targeting the year 2035. In addition, we identify opportunities across high-impact applications in domains related to this vision of Telelife. Along with a recent survey of relevant fields such as human-computer interaction, pervasive computing, and virtual reality, the directions outlined in this paper will guide future research on remote living.
SYSep 15, 2020
Optical Gaze Tracking with Spatially-Sparse Single-Pixel DetectorsRichard Li, Eric Whitmire, Michael Stengel et al.
Gaze tracking is an essential component of next generation displays for virtual reality and augmented reality applications. Traditional camera-based gaze trackers used in next generation displays are known to be lacking in one or multiple of the following metrics: power consumption, cost, computational complexity, estimation accuracy, latency, and form-factor. We propose the use of discrete photodiodes and light-emitting diodes (LEDs) as an alternative to traditional camera-based gaze tracking approaches while taking all of these metrics into consideration. We begin by developing a rendering-based simulation framework for understanding the relationship between light sources and a virtual model eyeball. Findings from this framework are used for the placement of LEDs and photodiodes. Our first prototype uses a neural network to obtain an average error rate of 2.67° at 400Hz while demanding only 16mW. By simplifying the implementation to using only LEDs, duplexed as light transceivers, and more minimal machine learning model, namely a light-weight supervised Gaussian process regression algorithm, we show that our second prototype is capable of an average error rate of 1.57° at 250 Hz using 800 mW.
CVMar 18, 2020
Gaze-Sensing LEDs for Head Mounted DisplaysKaan Akşit, Jan Kautz, David Luebke
We introduce a new gaze tracker for Head Mounted Displays (HMDs). We modify two off-the-shelf HMDs to be gaze-aware using Light Emitting Diodes (LEDs). Our key contribution is to exploit the sensing capability of LEDs to create low-power gaze tracker for virtual reality (VR) applications. This yields a simple approach using minimal hardware to achieve good accuracy and low latency using light-weight supervised Gaussian Process Regression (GPR) running on a mobile device. With our hardware, we show that Minkowski distance measure based GPR implementation outperforms the commonly used radial basis function-based support vector regression (SVR) without the need to precisely determine free parameters. We show that our gaze estimation method does not require complex dimension reduction techniques, feature extraction, or distortion corrections due to off-axis optical paths. We demonstrate two complete HMD prototypes with a sample eye-tracked application, and report on a series of subjective tests using our prototypes.