CVMay 21
OPERA: An Agent for Image Restoration with End-to-End Joint Planning-Execution OptimizationFeng Zhu, Shuyang Xie, Yihan Zeng et al.
Real-world image restoration is challenging due to complex and interacting mixed degradations. Recent agent-based approaches address this problem by composing multiple task-specific restoration tools. However, empirical analysis reveals that their performance is fundamentally limited by implicitly constrained planning spaces and the lack of coordination among independently pretrained tools. To address these issues, we propose OPERA (Optimized Planning-Execution Restoration Agent), a framework that jointly optimizes restoration planning and tool execution in an end-to-end manner. On the planning side, OPERA uses reinforcement learning to directly optimize tool composition over a combinatorial plan space, with the final restoration quality as the reward. On the execution side, OPERA introduces agent-guided co-training of restoration tools, enabling them to learn cooperative behaviors under sequential composition. Extensive experiments on multi-degradation benchmarks and real-world datasets demonstrate that OPERA consistently outperforms both all-in-one restoration models and existing agent-based methods across diverse and complex degradation scenarios.
OPTICSJun 17, 2025Code
A Lightweight Complex-Valued Deformable CNN for High-Quality Computer-Generated HolographyShuyang Xie, Jie Zhou, Bo Xu et al.
Holographic displays have significant potential in virtual reality and augmented reality owing to their ability to provide all the depth cues. Deep learning-based methods play an important role in computer-generated holography (CGH). During the diffraction process, each pixel exerts an influence on the reconstructed image. However, previous works face challenges in capturing sufficient information to accurately model this process, primarily due to the inadequacy of their effective receptive field (ERF). Here, we designed complex-valued deformable convolution for integration into network, enabling dynamic adjustment of the convolution kernel's shape to increase flexibility of ERF for better feature extraction. This approach allows us to utilize a single model while achieving state-of-the-art performance in both simulated and optical experiment reconstructions, surpassing existing open-source models. Specifically, our method has a peak signal-to-noise ratio that is 2.04 dB, 5.31 dB, and 9.71 dB higher than that of CCNN-CGH, HoloNet, and Holo-encoder, respectively, when the resolution is 1920$\times$1072. The number of parameters of our model is only about one-eighth of that of CCNN-CGH.
OPTICSDec 13, 2025
JPEG-Inspired Cloud-Edge HolographyShuyang Xie, Jie Zhou, Jun Wang et al.
Computer-generated holography (CGH) presents a transformative solution for near-eye displays in augmented and virtual reality. Recent advances in deep learning have greatly improved CGH in reconstructed quality and computational efficiency. However, deploying neural CGH pipelines directly on compact, eyeglass-style devices is hindered by stringent constraints on computation and energy consumption, while cloud offloading followed by transmission with natural image codecs often distorts phase information and requires high bandwidth to maintain reconstruction quality. Neural compression methods can reduce bandwidth but impose heavy neural decoders at the edge, increasing inference latency and hardware demand. In this work, we introduce JPEG-Inspired Cloud-Edge Holography, an efficient pipeline designed around a learnable transform codec that retains the block-structured and hardware-friendly nature of JPEG. Our system shifts all heavy neural processing to the cloud, while the edge device performs only lightweight decoding without any neural inference. To further improve throughput, we implement custom CUDA kernels for entropy coding on both cloud and edge. This design achieves a peak signal-to-noise ratio of 32.15 dB at $<$ 2 bits per pixel with decode latency as low as 4.2 ms. Both numerical simulations and optical experiments confirm the high reconstruction quality of the holograms. By aligning CGH with a codec that preserves JPEG's structural efficiency while extending it with learnable components, our framework enables low-latency, bandwidth-efficient hologram streaming on resource-constrained wearable devices-using only simple block-based decoding readily supported by modern system-on-chips, without requiring neural decoders or specialized hardware.
OPTICSJul 30, 2025
Eyepiece-free pupil-optimized holographic near-eye displaysJie Zhou, Shuyang Xie, Yang Wu et al.
Computer-generated holography (CGH) represents a transformative visualization approach for next-generation immersive virtual and augmented reality (VR/AR) displays, enabling precise wavefront modulation and naturally providing comprehensive physiological depth cues without the need for bulky optical assemblies. Despite significant advancements in computational algorithms enhancing image quality and achieving real-time generation, practical implementations of holographic near-eye displays (NEDs) continue to face substantial challenges arising from finite and dynamically varying pupil apertures, which degrade image quality and compromise user experience. In this study, we introduce an eyepiece-free pupil-optimized holographic NED. Our proposed method employs a customized spherical phase modulation strategy to generate multiple viewpoints within the pupil, entirely eliminating the dependence on conventional optical eyepieces. Through the joint optimization of amplitude and phase distributions across these viewpoints, the method markedly mitigates image degradation due to finite pupil sampling and resolves inapparent depth cues induced by the spherical phase. The demonstrated method signifies a substantial advancement toward the realization of compact, lightweight, and flexible holographic NED systems, fulfilling stringent requirements for future VR/AR display technologies.