Lingdong Wang

CV
5papers
18citations
Novelty62%
AI Score52

5 Papers

SYOct 15, 2023Code
BONES: Near-Optimal Neural-Enhanced Video Streaming

Lingdong Wang, Simran Singh, Jacob Chakareski et al.

Accessing high-quality video content can be challenging due to insufficient and unstable network bandwidth. Recent advances in neural enhancement have shown promising results in improving the quality of degraded videos through deep learning. Neural-Enhanced Streaming (NES) incorporates this new approach into video streaming, allowing users to download low-quality video segments and then enhance them to obtain high-quality content without violating the playback of the video stream. We introduce BONES, an NES control algorithm that jointly manages the network and computational resources to maximize the quality of experience (QoE) of the user. BONES formulates NES as a Lyapunov optimization problem and solves it in an online manner with near-optimal performance, making it the first NES algorithm to provide a theoretical performance guarantee. Comprehensive experimental results indicate that BONES increases QoE by 5\% to 20\% over state-of-the-art algorithms with minimal overhead. Our code is available at https://github.com/UMass-LIDS/bones.

CVSep 12, 2022Code
CU-Net: Real-Time High-Fidelity Color Upsampling for Point Clouds

Lingdong Wang, Mohammad Hajiesmaili, Jacob Chakareski et al.

Point cloud upsampling is essential for high-quality augmented reality, virtual reality, and telepresence applications, due to the capture, processing, and communication limitations of existing technologies. Although geometry upsampling to densify a point cloud's coordinates has been well studied, the upsampling of the color attributes has been largely overlooked. In this paper, we propose CU-Net, the first deep-learning point cloud color upsampling model that enables low latency and high visual fidelity operation. CU-Net achieves linear time and space complexity by leveraging a feature extractor based on sparse convolution and a color prediction module based on neural implicit function. Therefore, CU-Net is theoretically guaranteed to be more efficient than most existing methods with quadratic complexity. Experimental results demonstrate that CU-Net can colorize a photo-realistic point cloud with nearly a million points in real time, while having notably better visual performance than baselines. Besides, CU-Net can adapt to arbitrary upsampling ratios and unseen objects without retraining. Our source code is available at https://github.com/UMass-LIDS/cunet.

78.5CVApr 6
Low-Bitrate Video Compression through Semantic-Conditioned Diffusion

Lingdong Wang, Guan-Ming Su, Divya Kothandaraman et al.

Traditional video codecs optimized for pixel fidelity collapse at ultra-low bitrates and produce severe artifacts. This failure arises from a fundamental misalignment between pixel accuracy and human perception. We propose a semantic video compression framework named DiSCo that transmits only the most meaningful information while relying on generative priors for detail synthesis. The source video is decomposed into three compact modalities: a textual description, a spatiotemporally degraded video, and optional sketches or poses that respectively capture semantic, appearance, and motion cues. A conditional video diffusion model then reconstructs high-quality, temporally coherent videos from these compact representations. Temporal forward filling, token interleaving, and modality-specific codecs are proposed to improve multimodal generation and modality compactness. Experiments show that our method outperforms baseline semantic and traditional codecs by 2-10X on perceptual metrics at low bitrates.

93.4IVMay 18
CATRF: Codec-Adaptive TriPlane Radiance Fields for Volumetric Content Delivery

Tung-I Chen, Lingdong Wang, Subhransu Maji et al.

Volumetric media promises next-generation content delivery applications, but its bandwidth demand remains a key bottleneck. Implicit and hybrid volumetric representations reduce model sizes, yet still require careful coding to reach 2D video-like bitrates. We present CATRF, a standard-codec-in-the-loop compression framework for plane-factorized radiance fields. During training, we quantize and pack 2D feature planes into codec-friendly canvases, run a standard codec roundtrip (JPEG/VP9/HEVC/AV1), then unpack and dequantize the decoded features before volume rendering. We use a straight-through estimator (STE) to insert the non-differentiable, standard codec pipeline into the training loop, allowing radiance-field features to adapt directly to the real, client-side codec distortions without introducing any learned codec parameters. On both static and dynamic benchmarks, CATRF consistently achieves a better rate-distortion trade-off over codec-agnostic and learned-codec-in-the-loop baselines, and also outperforms recent compressed 3DGS methods in both compression efficiency and decoding speed. These results highlight a practical path toward low-bitrate, compression-resilient volumetric representations for free-viewpoint video streaming.

67.7NIApr 30
ReVo: A Cross-Layer Reliable Volumetric Videoconferencing System

Ankur Aditya, Diptyaroop Maji, Lingdong Wang et al.

Volumetric videoconferencing enables immersive six Degrees of Freedom interactions by jointly transmitting visual appearance and 3D geometry. However, delivering volumetric video over today's networks remains challenging due to high bandwidth demands, strict real-time latency constraints, and frequent packet loss. Packet loss not only degrades visual quality but also corrupts geometric structure, leading to severe artifacts and video freezes that significantly degrade Quality of Experience. Existing solutions either optimize volumetric videos assuming reliable networks or focus on loss recovery for 2D video, and are insufficient for volumetric videoconferencing. In this paper, we present ReVo, a loss-resilient volumetric videoconferencing system that jointly recovers RGB and depth content under packet loss while meeting real-time constraints on desktop-grade hardware. ReVo leverages the insight that effective recovery requires a cross-layer, modality-aware design. It decouples volumetric video into RGB and depth streams, selectively protects critical content using network-layer FEC, and reconstructs corrupted non-critical frames using a post-decode neural recovery module. ReVo is implemented end-to-end over WebRTC and supports both traditional and neural video codecs. Our evaluations using real-world loss traces show that ReVo improves median SSIM by up to 32% (resp. 13%) for RGB (resp. depth) content and reduces video freezes by up to 95.7% compared to existing techniques.