Zhisheng Yan

CR
h-index7
4papers
6citations
Novelty51%
AI Score43

4 Papers

63.9IVMar 24Code
Viewport-based Neural 360° Image Compression

Jingwei Liao, Bo Chen, Klara Nahrstedt et al.

Given the popularity of 360° images on social media platforms, 360° image compression becomes a critical technology for media storage and transmission. Conventional 360° image compression pipeline projects the spherical image into a single 2D plane, leading to issues of oversampling and distortion. In this paper, we propose a novel viewport-based neural compression pipeline for 360° images. By replacing the image projection in conventional 360° image compression pipelines with viewport extraction and efficiently compressing multiple viewports, the proposed pipeline minimizes the inherent oversampling and distortion issues. However, viewport extraction impedes information sharing between multiple viewports during compression, causing the loss of global information about the spherical image. To tackle this global information loss, we design a neural viewport codec to capture global prior information across multiple viewports and maximally compress the viewport data. The viewport codec is empowered by a transformer-based ViewPort ConText (VPCT) module that can be integrated with canonical learning-based 2D image compression structures. We compare the proposed pipeline with existing 360° image compression models and conventional 360° image compression pipelines building on learning-based 2D image codecs and standard hand-crafted codecs. Results show that our pipeline saves an average of $14.01\%$ bit consumption compared to the best-performing 360° image compression methods without compromising quality. The proposed VPCT-based codec also outperforms existing 2D image codecs in the viewport-based neural compression pipeline. Our code can be found at: https://github.com/Jingwei-Liao/VPCT.

CVOct 12, 2025
Scalable Face Security Vision Foundation Model for Deepfake, Diffusion, and Spoofing Detection

Gaojian Wang, Feng Lin, Tong Wu et al.

With abundant, unlabeled real faces, how can we learn robust and transferable facial representations to boost generalization across various face security tasks? We make the first attempt and propose FS-VFM, a scalable self-supervised pre-training framework, to learn fundamental representations of real face images. We introduce three learning objectives, namely 3C, that synergize masked image modeling (MIM) and instance discrimination (ID), empowering FS-VFM to encode both local patterns and global semantics of real faces. Specifically, we formulate various facial masking strategies for MIM and devise a simple yet effective CRFR-P masking, which explicitly prompts the model to pursue meaningful intra-region Consistency and challenging inter-region Coherency. We present a reliable self-distillation mechanism that seamlessly couples MIM with ID to establish underlying local-to-global Correspondence. After pre-training, vanilla vision transformers (ViTs) serve as universal Vision Foundation Models for downstream Face Security tasks: cross-dataset deepfake detection, cross-domain face anti-spoofing, and unseen diffusion facial forensics. To efficiently transfer the pre-trained FS-VFM, we further propose FS-Adapter, a lightweight plug-and-play bottleneck atop the frozen backbone with a novel real-anchor contrastive objective. Extensive experiments on 11 public benchmarks demonstrate that our FS-VFM consistently generalizes better than diverse VFMs, spanning natural and facial domains, fully, weakly, and self-supervised paradigms, small, base, and large ViT scales, and even outperforms SOTA task-specific methods, while FS-Adapter offers an excellent efficiency-performance trade-off. The code and models are available on https://fsfm-3c.github.io/fsvfm.html.

CRMar 24, 2016
Computationally Recoverable Camouflage: A Universal Model for Privacy-Aware Location-Based Services

Changsha Ma, Zhisheng Yan, Chang Wen Chen

With the prevalence of location-based services (LBSs) supported by advanced positioning technology, there is a dramatic increase in the transmission of high-precision personal geographical data. Malicious use of these sensitive data will threaten the privacy of LBS users. Although privacy research in LBSs has received wide attention, related works are mostly focused on some specific applications. Due to high diversity of LBSs, it is critical to build a universal model that is able to handle privacy protection for broader range of applications. In this paper, we propose a Computationally Recoverable Camouflage (CRC) model, where LBS users can preserve privacy by reporting camouflaged location information and are able to flexibly leverage between the service quality and the achieved privacy in different applications by adjusting the precision of the camouflage information. In particular, we propose a novel camouflage algorithm with formal privacy guarantee that considers both location context and social context, enabling LBS users to scalably expose camouflage information in terms of two dimensions. Furthermore, we apply the Scalable Ciphertext Policy Attribute-Based Encryption (SCP-ABE) algorithm to enforce fine-grained access control on the two-dimensional-scalable camouflage information. Through successful implementations on Android devices, we have demonstrated the computational efficiency of the proposed CRC model.

NIJan 18, 2015
Service Provisioning and Profit Maximization in Network-assisted Adaptive HTTP Streaming

Zhisheng Yan, Cedric Westphal, Xin Wang et al.

Adaptive HTTP streaming with centralized consideration of multiple streams has gained increasing interest. It poses a special challenge that the interests of both content provider and network operator need to be deliberately balanced. More importantly, the adaptation strategy is required to be flexible enough to be ported to various systems that work under different network environments, QoE levels, and economic objectives. To address these challenges, we propose a Markov Decision Process (MDP) based network-assisted adaptation framework, wherein cost of buffering, significant playback variation, bandwidth management and income of playback are jointly investigated. We then demonstrate its promising service provisioning and maximal profit for a mobile network in which fair or differentiated service is required.