CVSep 20, 2023
Light Field Diffusion for Single-View Novel View SynthesisYifeng Xiong, Haoyu Ma, Shanlin Sun et al. · meta-ai
Single-view novel view synthesis (NVS), the task of generating images from new viewpoints based on a single reference image, is important but challenging in computer vision. Recent advancements in NVS have leveraged Denoising Diffusion Probabilistic Models (DDPMs) for their exceptional ability to produce high-fidelity images. However, current diffusion-based methods typically utilize camera pose matrices to globally and implicitly enforce 3D constraints, which can lead to inconsistencies in images generated from varying viewpoints, particularly in regions with complex textures and structures. To address these limitations, we present Light Field Diffusion (LFD), a novel conditional diffusion-based approach that transcends the conventional reliance on camera pose matrices. Starting from the camera pose matrices, LFD transforms them into light field encoding, with the same shape as the reference image, to describe the direction of each ray. By integrating light field encoding with the reference image, our method imposes local pixel-wise constraints within the diffusion process, fostering enhanced view consistency. Our approach not only involves training image LFD on the ShapeNet Car dataset but also includes fine-tuning a pre-trained latent diffusion model on the Objaverse dataset. This enables our latent LFD model to exhibit remarkable zero-shot generalization capabilities across out-of-distribution datasets like RTMV as well as in-the-wild images. Experiments demonstrate that LFD not only produces high-fidelity images but also achieves superior 3D consistency in complex regions, outperforming existing novel view synthesis methods.
IVApr 8, 2023
MedGen3D: A Deep Generative Framework for Paired 3D Image and Mask GenerationKun Han, Yifeng Xiong, Chenyu You et al.
Acquiring and annotating sufficient labeled data is crucial in developing accurate and robust learning-based models, but obtaining such data can be challenging in many medical image segmentation tasks. One promising solution is to synthesize realistic data with ground-truth mask annotations. However, no prior studies have explored generating complete 3D volumetric images with masks. In this paper, we present MedGen3D, a deep generative framework that can generate paired 3D medical images and masks. First, we represent the 3D medical data as 2D sequences and propose the Multi-Condition Diffusion Probabilistic Model (MC-DPM) to generate multi-label mask sequences adhering to anatomical geometry. Then, we use an image sequence generator and semantic diffusion refiner conditioned on the generated mask sequences to produce realistic 3D medical images that align with the generated masks. Our proposed framework guarantees accurate alignment between synthetic images and segmentation maps. Experiments on 3D thoracic CT and brain MRI datasets show that our synthetic data is both diverse and faithful to the original data, and demonstrate the benefits for downstream segmentation tasks. We anticipate that MedGen3D's ability to synthesize paired 3D medical images and masks will prove valuable in training deep learning models for medical imaging tasks.
CVApr 6, 2022
Sampling-based Fast Gradient Rescaling Method for Highly Transferable Adversarial AttacksXu Han, Anmin Liu, Yifeng Xiong et al.
Deep neural networks have shown to be very vulnerable to adversarial examples crafted by adding human-imperceptible perturbations to benign inputs. After achieving impressive attack success rates in the white-box setting, more focus is shifted to black-box attacks. In either case, the common gradient-based approaches generally use the $sign$ function to generate perturbations at the end of the process. However, only a few works pay attention to the limitation of the $sign$ function. Deviation between the original gradient and the generated noises may lead to inaccurate gradient update estimation and suboptimal solutions for adversarial transferability, which is crucial for black-box attacks. To address this issue, we propose a Sampling-based Fast Gradient Rescaling Method (S-FGRM) to improve the transferability of the crafted adversarial examples. Specifically, we use data rescaling to substitute the inefficient $sign$ function in gradient-based attacks without extra computational cost. We also propose a Depth First Sampling method to eliminate the fluctuation of rescaling and stabilize the gradient update. Our method can be used in any gradient-based optimizations and is extensible to be integrated with various input transformation or ensemble methods for further improving the adversarial transferability. Extensive experiments on the standard ImageNet dataset show that our S-FGRM could significantly boost the transferability of gradient-based attacks and outperform the state-of-the-art baselines.
CVJan 28, 2023
Semantic Adversarial Attacks on Face Recognition through Significant AttributesYasmeen M. Khedr, Yifeng Xiong, Kun He
Face recognition is known to be vulnerable to adversarial face images. Existing works craft face adversarial images by indiscriminately changing a single attribute without being aware of the intrinsic attributes of the images. To this end, we propose a new Semantic Adversarial Attack called SAA-StarGAN that tampers with the significant facial attributes for each image. We predict the most significant attributes by applying the cosine similarity or probability score. The probability score method is based on training a Face Verification model for an attribute prediction task to obtain a class probability score for each attribute. The prediction process will help craft adversarial face images more easily and efficiently, as well as improve the adversarial transferability. Then, we change the most significant facial attributes, with either one or more of the facial attributes for impersonation and dodging attacks in white-box and black-box settings. Experimental results show that our method could generate diverse and realistic adversarial face images meanwhile avoid affecting human perception of the face recognition. SAA-StarGAN achieves an 80.5% attack success rate against black-box models, outperforming existing methods by 35.5% under the impersonation attack. Concerning the black-box setting, SAA-StarGAN achieves high attack success rates on various models. The experiments confirm that predicting the most important attributes significantly affects the success of adversarial attacks in both white-box and black-box settings and could enhance the transferability of the crafted adversarial examples.
ITMar 30
Simultaneous Sensing Data Acquisition and Sharing in Low-Altitude Wireless Networks: Fundamental Limits and Optimal SignalingFuwang Dong, Fan Liu, Yifeng Xiong et al.
In the low-altitude wireless networks, the simultaneous sensing data acquisition and sharing (SDAS) through an ISAC signaling strategy becomes a typical application scenario. In this paper, we mainly investigate three primary aspects of the SDAS system, namely, the information-theoretic framework, the optimal distribution of channel input, and the optimal waveform design for Gaussian signaling. First, we establish the information-theoretic framework and develop a modified source-channel separation theorem (MSST) tailored for the SDAS systems. The proposed MSST elucidates the relationship between achievable distortion, coding rate, and communication channel capacity in cases where the distortion metric is separable for sensing and communication (S\&C) processes. Second, we present an optimal channel input design for dual-functional signaling, which aims to minimize SDAS distortion under the constraints of the MSST and resource budget. We then conceive a two-step Blahut-Arimoto (BA)-based optimal search algorithm to numerically solve the functional optimization problem. Third, to provide practical design insights, we further propose an optimal waveform design for Gaussian signaling in multi-input multi-output (MIMO) SDAS systems. The associated covariance matrix optimization problem is addressed using a successive convex approximation (SCA)-based waveform design algorithm. Finally, we provide numerical simulation results to demonstrate the effectiveness of the proposed algorithms, which characterize the unique performance tradeoff between S&C processes.
ITMay 14
CP-OFDM Achieves Lower Ranging CRB Than Frequency-Spread Waveforms in the Large-Sample RegimeFan Liu, Yifeng Xiong, Ya-Feng Liu et al.
The inherent randomness of communication symbols creates a fundamental tension in Integrated Sensing and Communications (ISAC). On the one hand, they enable data transmission while allowing sensing to fully reuse communication resources. On the other hand, their randomness induces waveform-dependent fluctuations that directly affect sensing accuracy. This paper investigates a foundational question arising from this tradeoff: \textit{How does the modulation waveform affect the ranging Cramér--Rao Bound (CRB) when sensing reuses random data symbols?} We address this question by revealing a structural factorization of the Fisher information matrix (FIM) for joint delay-amplitude estimation, which separates the deterministic Jacobian of the target geometry from the random frequency-domain signal power induced by the data symbols. This structure yields a Jensen-type universal lower bound on the CRB, which is exactly attained by CP-OFDM under PSK constellations. For QAM and broader sub-Gaussian constellations, we develop an asymptotic perturbation analysis of the inverse FIM and prove that, when the number of transmitted symbols $N$ grows large, CP-OFDM achieves a lower ranging CRB than any frequency-spread orthogonal waveform over the almost-sure event where the random FIM is invertible. This superiority is further extended to amplitude estimation and full joint delay-amplitude estimation. We also characterize the local geometry of the stochastic CRB minimization problem over the unitary group. The analysis reveals that CP-OFDM is a stationary point for finite $N$, and its Riemannian Hessian is positive semidefinite for sufficiently large $N$, establishing its asymptotic local optimality. Numerical results confirm that OFDM outperforms representative waveforms including SC, OTFS, and AFDM.
LGNov 21, 2021Code
Stochastic Variance Reduced Ensemble Adversarial Attack for Boosting the Adversarial TransferabilityYifeng Xiong, Jiadong Lin, Min Zhang et al.
The black-box adversarial attack has attracted impressive attention for its practical use in the field of deep learning security. Meanwhile, it is very challenging as there is no access to the network architecture or internal weights of the target model. Based on the hypothesis that if an example remains adversarial for multiple models, then it is more likely to transfer the attack capability to other models, the ensemble-based adversarial attack methods are efficient and widely used for black-box attacks. However, ways of ensemble attack are rather less investigated, and existing ensemble attacks simply fuse the outputs of all the models evenly. In this work, we treat the iterative ensemble attack as a stochastic gradient descent optimization process, in which the variance of the gradients on different models may lead to poor local optima. To this end, we propose a novel attack method called the stochastic variance reduced ensemble (SVRE) attack, which could reduce the gradient variance of the ensemble models and take full advantage of the ensemble attack. Empirical results on the standard ImageNet dataset demonstrate that the proposed method could boost the adversarial transferability and outperforms existing ensemble attacks significantly. Code is available at https://github.com/JHL-HUST/SVRE.
ITApr 29
Input Distribution Design for Ranging-Oriented OFDM-ISAC Systems Under Frequency-Selective FadingWeijiang Zhao, Yifeng Xiong
The implementation of the \ac{isac} feature in \ac{6g} networks is most likely to be based on the framework of \ac{ofdm}. Input distribution design, or constellation design, is a crucial technique in \ac{ofdm}-\ac{isac} systems enabling a favorable balance between communication rate and sensing performance. In this treatise, we propose a computationally efficient input distribution design approach for \ac{ofdm}-\ac{isac} under frequency-selective channels, following the theoretical framework of capacity distortion. We highlight that under practical sensing constraints, the optimal strategy is to treat the kurtosis of constellations as a resource, and allocate it appropriately over subcarriers.
CLOct 14, 2025
OPLoRA: Orthogonal Projection LoRA Prevents Catastrophic Forgetting during Parameter-Efficient Fine-TuningYifeng Xiong, Xiaohui Xie
Low-Rank Adaptation (LoRA) enables efficient fine-tuning of large language models but suffers from catastrophic forgetting when learned updates interfere with the dominant singular directions that encode essential pre-trained knowledge. We propose Orthogonal Projection LoRA (OPLoRA), a theoretically grounded approach that prevents this interference through double-sided orthogonal projections. By decomposing frozen weights via SVD, OPLoRA constrains LoRA updates to lie entirely within the orthogonal complement of the top-$k$ singular subspace using projections $P_L = I - U_k U_k^\top$ and $P_R = I - V_k V_k^\top$. We prove that this construction exactly preserves the top-$k$ singular triples, providing mathematical guarantees for knowledge retention. To quantify subspace interference, we introduce $ρ_k$, a metric measuring update alignment with dominant directions. Extensive experiments across commonsense reasoning, mathematics, and code generation demonstrate that OPLoRA significantly reduces forgetting while maintaining competitive task-specific performance on LLaMA-2 7B and Qwen2.5 7B, establishing orthogonal projection as an effective mechanism for knowledge preservation in parameter-efficient fine-tuning.
CVAug 20, 2025
Ouroboros: Single-step Diffusion Models for Cycle-consistent Forward and Inverse RenderingShanlin Sun, Yifan Wang, Hanwen Zhang et al.
While multi-step diffusion models have advanced both forward and inverse rendering, existing approaches often treat these problems independently, leading to cycle inconsistency and slow inference speed. In this work, we present Ouroboros, a framework composed of two single-step diffusion models that handle forward and inverse rendering with mutual reinforcement. Our approach extends intrinsic decomposition to both indoor and outdoor scenes and introduces a cycle consistency mechanism that ensures coherence between forward and inverse rendering outputs. Experimental results demonstrate state-of-the-art performance across diverse scenes while achieving substantially faster inference speed compared to other diffusion-based methods. We also demonstrate that Ouroboros can transfer to video decomposition in a training-free manner, reducing temporal inconsistency in video sequences while maintaining high-quality per-frame inverse rendering.
CLSep 13, 2021
Detecting Textual Adversarial Examples through Randomized Substitution and VoteXiaosen Wang, Yifeng Xiong, Kun He
A line of work has shown that natural text processing models are vulnerable to adversarial examples. Correspondingly, various defense methods are proposed to mitigate the threat of textual adversarial examples, eg, adversarial training, input transformations, detection, etc. In this work, we treat the optimization process for synonym substitution based textual adversarial attacks as a specific sequence of word replacement, in which each word mutually influences other words. We identify that we could destroy such mutual interaction and eliminate the adversarial perturbation by randomly substituting a word with its synonyms. Based on this observation, we propose a novel textual adversarial example detection method, termed Randomized Substitution and Vote (RS&V), which votes the prediction label by accumulating the logits of k samples generated by randomly substituting the words in the input text with synonyms. The proposed RS&V is generally applicable to any existing neural networks without modification on the architecture or extra training, and it is orthogonal to prior work on making the classification network itself more robust. Empirical evaluations on three benchmark datasets demonstrate that our RS&V could detect the textual adversarial examples more successfully than the existing detection methods while maintaining the high classification accuracy on benign samples.