Zhixun Su

CV
h-index8
21papers
480citations
Novelty51%
AI Score47

21 Papers

CVAug 17, 2023Code
Learning A Coarse-to-Fine Diffusion Transformer for Image Restoration

Liyan Wang, Qinyu Yang, Cong Wang et al.

Recent years have witnessed the remarkable performance of diffusion models in various vision tasks. However, for image restoration that aims to recover clear images with sharper details from given degraded observations, diffusion-based methods may fail to recover promising results due to inaccurate noise estimation. Moreover, simple constraining noises cannot effectively learn complex degradation information, which subsequently hinders the model capacity. To solve the above problems, we propose a coarse-to-fine diffusion Transformer (C2F-DFT) for image restoration. Specifically, our C2F-DFT contains diffusion self-attention (DFSA) and diffusion feed-forward network (DFN) within a new coarse-to-fine training scheme. The DFSA and DFN respectively capture the long-range diffusion dependencies and learn hierarchy diffusion representation to facilitate better restoration. In the coarse training stage, our C2F-DFT estimates noises and then generates the final clean image by a sampling algorithm. To further improve the restoration quality, we propose a simple yet effective fine training scheme. It first exploits the coarse-trained diffusion model with fixed steps to generate restoration results, which then would be constrained with corresponding ground-truth ones to optimize the models to remedy the unsatisfactory results affected by inaccurate noise estimation. Extensive experiments show that C2F-DFT significantly outperforms diffusion-based restoration method IR-SDE and achieves competitive performance compared with Transformer-based state-of-the-art methods on $3$ tasks, including image deraining, image deblurring, and real image denoising. Code is available at https://github.com/wlydlut/C2F-DFT.

NAMay 6, 2012
Solving Principal Component Pursuit in Linear Time via $l_1$ Filtering

Risheng Liu, Zhouchen Lin, Siming Wei et al.

In the past decades, exactly recovering the intrinsic data structure from corrupted observations, which is known as robust principal component analysis (RPCA), has attracted tremendous interests and found many applications in computer vision. Recently, this problem has been formulated as recovering a low-rank component and a sparse component from the observed data matrix. It is proved that under some suitable conditions, this problem can be exactly solved by principal component pursuit (PCP), i.e., minimizing a combination of nuclear norm and $l_1$ norm. Most of the existing methods for solving PCP require singular value decompositions (SVD) of the data matrix, resulting in a high computational complexity, hence preventing the applications of RPCA to very large scale computer vision problems. In this paper, we propose a novel algorithm, called $l_1$ filtering, for \emph{exactly} solving PCP with an $O(r^2(m+n))$ complexity, where $m\times n$ is the size of data matrix and $r$ is the rank of the matrix to recover, which is supposed to be much smaller than $m$ and $n$. Moreover, $l_1$ filtering is \emph{highly parallelizable}. It is the first algorithm that can \emph{exactly} solve a nuclear norm minimization problem in \emph{linear time} (with respect to the data size). Experiments on both synthetic data and real applications testify to the great advantage of $l_1$ filtering in speed over state-of-the-art algorithms.

CVJul 14, 2024
Noise Calibration: Plug-and-play Content-Preserving Video Enhancement using Pre-trained Video Diffusion Models

Qinyu Yang, Haoxin Chen, Yong Zhang et al.

In order to improve the quality of synthesized videos, currently, one predominant method involves retraining an expert diffusion model and then implementing a noising-denoising process for refinement. Despite the significant training costs, maintaining consistency of content between the original and enhanced videos remains a major challenge. To tackle this challenge, we propose a novel formulation that considers both visual quality and consistency of content. Consistency of content is ensured by a proposed loss function that maintains the structure of the input, while visual quality is improved by utilizing the denoising process of pretrained diffusion models. To address the formulated optimization problem, we have developed a plug-and-play noise optimization strategy, referred to as Noise Calibration. By refining the initial random noise through a few iterations, the content of original video can be largely preserved, and the enhancement effect demonstrates a notable improvement. Extensive experiments have demonstrated the effectiveness of the proposed method.

CVMay 22, 2025Code
Deep Learning-Driven Ultra-High-Definition Image Restoration: A Survey

Liyan Wang, Weixiang Zhou, Cong Wang et al.

Ultra-high-definition (UHD) image restoration aims to specifically solve the problem of quality degradation in ultra-high-resolution images. Recent advancements in this field are predominantly driven by deep learning-based innovations, including enhancements in dataset construction, network architecture, sampling strategies, prior knowledge integration, and loss functions. In this paper, we systematically review recent progress in UHD image restoration, covering various aspects ranging from dataset construction to algorithm design. This serves as a valuable resource for understanding state-of-the-art developments in the field. We begin by summarizing degradation models for various image restoration subproblems, such as super-resolution, low-light enhancement, deblurring, dehazing, deraining, and desnowing, and emphasizing the unique challenges of their application to UHD image restoration. We then highlight existing UHD benchmark datasets and organize the literature according to degradation types and dataset construction methods. Following this, we showcase major milestones in deep learning-driven UHD image restoration, reviewing the progression of restoration tasks, technological developments, and evaluations of existing methods. We further propose a classification framework based on network architectures and sampling strategies, helping to clearly organize existing methods. Finally, we share insights into the current research landscape and propose directions for further advancements. A related repository is available at https://github.com/wlydlut/UHD-Image-Restoration-Survey.

CVSep 27, 2025Code
GRAPE: Let GPRO Supervise Query Rewriting by Ranking for Retrieval

Zhaohua Zhang, Jianhuan Zhuo, Muxi Chen et al.

The CLIP model has become a cornerstone of large-scale retrieval systems by aligning text and image data in a unified embedding space. Despite its simplicity and efficiency, CLIP struggles when applied to tasks whose input distributions diverge from its training corpus, such as queries with multilingual, long-form, or multimodal differences. To avoid costly retraining, existing methods mainly adopt query-rewriting strategies with large language models (LLMs), aiming to mitigate distribution gaps at the query level. However, due to the lack of supervision signals, LLMs fail to generate the optimal one that fits the training distribution. We address this challenge with GRAPE (Grouped Ranking-Aware Policy Optimization Enhancement), a plug-and-play enhancement approach that incorporates ranking signals into retrieval-guided query rewriting with LLMs. Intuitively, GRAPE proposes to leverage GRPO to bridge distributional differences -- including length, multilingual, and modality shifts -- by transforming queries into forms better aligned with the retriever's training distribution. However, our preliminary experiment finds that naively finetuning LLM with similarity scores can lead to score inflation, where nearly all candidates are assigned unexpectedly high scores regardless of their true relevance. To address score inflation, we propose a corpus-relative ranking-based reward, which explicitly aligns optimization with ranking metrics while suppressing spurious score inflation. Extensive experiments demonstrate that GRAPE consistently improves retrieval performance under distributional shifts -- including multilingual differences (Flickr30k-CN, CVLUE, XM3600), length differences (Wikipedia), and multimodal differences (CIRR) -- achieving an average improvement of 4.9\% in Recall\@10. The code is available at https://github.com/Chinese0123456/GRAPE.git

CVJul 24, 2025Code
Adapting Large VLMs with Iterative and Manual Instructions for Generative Low-light Enhancement

Xiaoran Sun, Liyan Wang, Cong Wang et al.

Most existing low-light image enhancement (LLIE) methods rely on pre-trained model priors, low-light inputs, or both, while neglecting the semantic guidance available from normal-light images. This limitation hinders their effectiveness in complex lighting conditions. In this paper, we propose VLM-IMI, a novel framework that leverages large vision-language models (VLMs) with iterative and manual instructions (IMIs) for LLIE. VLM-IMI incorporates textual descriptions of the desired normal-light content as enhancement cues, enabling semantically informed restoration. To effectively integrate cross-modal priors, we introduce an instruction prior fusion module, which dynamically aligns and fuses image and text features, promoting the generation of detailed and semantically coherent outputs. During inference, we adopt an iterative and manual instruction strategy to refine textual instructions, progressively improving visual quality. This refinement enhances structural fidelity, semantic alignment, and the recovery of fine details under extremely low-light conditions. Extensive experiments across diverse scenarios demonstrate that VLM-IMI outperforms state-of-the-art methods in both quantitative metrics and perceptual quality. The source code is available at https://github.com/sunxiaoran01/VLM-IMI.

CVJun 19, 2024Code
Ultra-High-Definition Image Restoration: New Benchmarks and A Dual Interaction Prior-Driven Solution

Liyan Wang, Cong Wang, Jinshan Pan et al.

Ultra-High-Definition (UHD) image restoration has acquired remarkable attention due to its practical demand. In this paper, we construct UHD snow and rain benchmarks, named UHD-Snow and UHD-Rain, to remedy the deficiency in this field. The UHD-Snow/UHD-Rain is established by simulating the physics process of rain/snow into consideration and each benchmark contains 3200 degraded/clear image pairs of 4K resolution. Furthermore, we propose an effective UHD image restoration solution by considering gradient and normal priors in model design thanks to these priors' spatial and detail contributions. Specifically, our method contains two branches: (a) feature fusion and reconstruction branch in high-resolution space and (b) prior feature interaction branch in low-resolution space. The former learns high-resolution features and fuses prior-guided low-resolution features to reconstruct clear images, while the latter utilizes normal and gradient priors to mine useful spatial features and detail features to guide high-resolution recovery better. To better utilize these priors, we introduce single prior feature interaction and dual prior feature interaction, where the former respectively fuses normal and gradient priors with high-resolution features to enhance prior ones, while the latter calculates the similarity between enhanced prior ones and further exploits dual guided filtering to boost the feature interaction of dual priors. We conduct experiments on both new and existing public datasets and demonstrate the state-of-the-art performance of our method on UHD image low-light enhancement, dehazing, deblurring, desonwing, and deraining. The source codes and benchmarks are available at \url{https://github.com/wlydlut/UHDDIP}.

CVJun 28, 2020Code
SMPR: Single-Stage Multi-Person Pose Regression

Junqi Lin, Huixin Miao, Junjie Cao et al.

Existing multi-person pose estimators can be roughly divided into two-stage approaches (top-down and bottom-up approaches) and one-stage approaches. The two-stage methods either suffer high computational redundancy for additional person detectors or group keypoints heuristically after predicting all the instance-free keypoints. The recently proposed single-stage methods do not rely on the above two extra stages but have lower performance than the latest bottom-up approaches. In this work, a novel single-stage multi-person pose regression, termed SMPR, is presented. It follows the paradigm of dense prediction and predicts instance-aware keypoints from every location. Besides feature aggregation, we propose better strategies to define positive pose hypotheses for training which all play an important role in dense pose estimation. The network also learns the scores of estimated poses. The pose scoring strategy further improves the pose estimation performance by prioritizing superior poses during non-maximum suppression (NMS). We show that our method not only outperforms existing single-stage methods and but also be competitive with the latest bottom-up methods, with 70.2 AP and 77.5 AP75 on the COCO test-dev pose benchmark. Code is available at https://github.com/cmdi-dlut/SMPR.

IVFeb 29, 2024
LoLiSRFlow: Joint Single Image Low-light Enhancement and Super-resolution via Cross-scale Transformer-based Conditional Flow

Ziyu Yue, Jiaxin Gao, Sihan Xie et al.

The visibility of real-world images is often limited by both low-light and low-resolution, however, these issues are only addressed in the literature through Low-Light Enhancement (LLE) and Super- Resolution (SR) methods. Admittedly, a simple cascade of these approaches cannot work harmoniously to cope well with the highly ill-posed problem for simultaneously enhancing visibility and resolution. In this paper, we propose a normalizing flow network, dubbed LoLiSRFLow, specifically designed to consider the degradation mechanism inherent in joint LLE and SR. To break the bonds of the one-to-many mapping for low-light low-resolution images to normal-light high-resolution images, LoLiSRFLow directly learns the conditional probability distribution over a variety of feasible solutions for high-resolution well-exposed images. Specifically, a multi-resolution parallel transformer acts as a conditional encoder that extracts the Retinex-induced resolution-and-illumination invariant map as the previous one. And the invertible network maps the distribution of usually exposed high-resolution images to a latent distribution. The backward inference is equivalent to introducing an additional constrained loss for the normal training route, thus enabling the manifold of the natural exposure of the high-resolution image to be immaculately depicted. We also propose a synthetic dataset modeling the realistic low-light low-resolution degradation, named DFSR-LLE, containing 7100 low-resolution dark-light/high-resolution normal sharp pairs. Quantitative and qualitative experimental results demonstrate the effectiveness of our method on both the proposed synthetic and real datasets.

CVJun 3, 2025
ControlMambaIR: Conditional Controls with State-Space Model for Image Restoration

Cheng Yang, Lijing Liang, Zhixun Su

This paper proposes ControlMambaIR, a novel image restoration method designed to address perceptual challenges in image deraining, deblurring, and denoising tasks. By integrating the Mamba network architecture with the diffusion model, the condition network achieves refined conditional control, thereby enhancing the control and optimization of the image generation process. To evaluate the robustness and generalization capability of our method across various image degradation conditions, extensive experiments were conducted on several benchmark datasets, including Rain100H, Rain100L, GoPro, and SSID. The results demonstrate that our proposed approach consistently surpasses existing methods in perceptual quality metrics, such as LPIPS and FID, while maintaining comparable performance in image distortion metrics, including PSNR and SSIM, highlighting its effectiveness and adaptability. Notably, ablation experiments reveal that directly noise prediction in the diffusion process achieves better performance, effectively balancing noise suppression and detail preservation. Furthermore, the findings indicate that the Mamba architecture is particularly well-suited as a conditional control network for diffusion models, outperforming both CNN- and Attention-based approaches in this context. Overall, these results highlight the flexibility and effectiveness of ControlMambaIR in addressing a range of image restoration perceptual challenges.

CVMay 8, 2023
Real-World Denoising via Diffusion Model

Cheng Yang, Lijing Liang, Zhixun Su

Real-world image denoising is an extremely important image processing problem, which aims to recover clean images from noisy images captured in natural environments. In recent years, diffusion models have achieved very promising results in the field of image generation, outperforming previous generation models. However, it has not been widely used in the field of image denoising because it is difficult to control the appropriate position of the added noise. Inspired by diffusion models, this paper proposes a novel general denoising diffusion model that can be used for real-world image denoising. We introduce a diffusion process with linear interpolation, and the intermediate noisy image is interpolated from the original clean image and the corresponding real-world noisy image, so that this diffusion model can handle the level of added noise. In particular, we also introduce two sampling algorithms for this diffusion model. The first one is a simple sampling procedure defined according to the diffusion process, and the second one targets the problem of the first one and makes a number of improvements. Our experimental results show that our proposed method with a simple CNNs Unet achieves comparable results compared to the Transformer architecture. Both quantitative and qualitative evaluations on real-world denoising benchmarks show that the proposed general diffusion model performs almost as well as against the state-of-the-art methods.

CVAug 6, 2020
Joint Self-Attention and Scale-Aggregation for Self-Calibrated Deraining Network

Cong Wang, Yutong Wu, Zhixun Su et al.

In the field of multimedia, single image deraining is a basic pre-processing work, which can greatly improve the visual effect of subsequent high-level tasks in rainy conditions. In this paper, we propose an effective algorithm, called JDNet, to solve the single image deraining problem and conduct the segmentation and detection task for applications. Specifically, considering the important information on multi-scale features, we propose a Scale-Aggregation module to learn the features with different scales. Simultaneously, Self-Attention module is introduced to match or outperform their convolutional counterparts, which allows the feature aggregation to adapt to each channel. Furthermore, to improve the basic convolutional feature transformation process of Convolutional Neural Networks (CNNs), Self-Calibrated convolution is applied to build long-range spatial and inter-channel dependencies around each spatial location that explicitly expand fields-of-view of each convolutional layer through internal communications and hence enriches the output features. By designing the Scale-Aggregation and Self-Attention modules with Self-Calibrated convolution skillfully, the proposed model has better deraining results both on real-world and synthetic datasets. Extensive experiments are conducted to demonstrate the superiority of our method compared with state-of-the-art methods. The source code will be available at \url{https://supercong94.wixsite.com/supercong94}.

CVAug 3, 2020
DCSFN: Deep Cross-scale Fusion Network for Single Image Rain Removal

Cong Wang, Xiaoying Xing, Zhixun Su et al.

Rain removal is an important but challenging computer vision task as rain streaks can severely degrade the visibility of images that may make other visions or multimedia tasks fail to work. Previous works mainly focused on feature extraction and processing or neural network structure, while the current rain removal methods can already achieve remarkable results, training based on single network structure without considering the cross-scale relationship may cause information drop-out. In this paper, we explore the cross-scale manner between networks and inner-scale fusion operation to solve the image rain removal task. Specifically, to learn features with different scales, we propose a multi-sub-networks structure, where these sub-networks are fused via a crossscale manner by Gate Recurrent Unit to inner-learn and make full use of information at different scales in these sub-networks. Further, we design an inner-scale connection block to utilize the multi-scale information and features fusion way between different scales to improve rain representation ability and we introduce the dense block with skip connection to inner-connect these blocks. Experimental results on both synthetic and real-world datasets have demonstrated the superiority of our proposed method, which outperforms over the state-of-the-art methods. The source code will be available at https://supercong94.wixsite.com/supercong94.

IVMar 30, 2020
Physical Model Guided Deep Image Deraining

Honghe Zhu, Cong Wang, Yajie Zhang et al.

Single image deraining is an urgent task because the degraded rainy image makes many computer vision systems fail to work, such as video surveillance and autonomous driving. So, deraining becomes important and an effective deraining algorithm is needed. In this paper, we propose a novel network based on physical model guided learning for single image deraining, which consists of three sub-networks: rain streaks network, rain-free network, and guide-learning network. The concatenation of rain streaks and rain-free image that are estimated by rain streaks network, rain-free network, respectively, is input to the guide-learning network to guide further learning and the direct sum of the two estimated images is constrained with the input rainy image based on the physical model of rainy image. Moreover, we further develop the Multi-Scale Residual Block (MSRB) to better utilize multi-scale information and it is proved to boost the deraining performance. Quantitative and qualitative experimental results demonstrate that the proposed method outperforms the state-of-the-art deraining methods. The source code will be available at \url{https://supercong94.wixsite.com/supercong94}.

CVMar 20, 2019
Non-rigid 3D shape retrieval based on multi-view metric learning

Haohao Li, Shengfa Wang, Nannan Li et al.

This study presents a novel multi-view metric learning algorithm, which aims to improve 3D non-rigid shape retrieval. With the development of non-rigid 3D shape analysis, there exist many shape descriptors. The intrinsic descriptors can be explored to construct various intrinsic representations for non-rigid 3D shape retrieval task. The different intrinsic representations (features) focus on different geometric properties to describe the same 3D shape, which makes the representations are related. Therefore, it is possible and necessary to learn multiple metrics for different representations jointly. We propose an effective multi-view metric learning algorithm by extending the Marginal Fisher Analysis (MFA) into the multi-view domain, and exploring Hilbert-Schmidt Independence Criteria (HSCI) as a diversity term to jointly learning the new metrics. The different classes can be separated by MFA in our method. Meanwhile, HSCI is exploited to make the multiple representations to be consensus. The learned metrics can reduce the redundancy between the multiple representations, and improve the accuracy of the retrieval results. Experiments are performed on SHREC'10 benchmarks, and the results show that the proposed method outperforms the state-of-the-art non-rigid 3D shape retrieval methods.

CVJan 31, 2018
Learning Video-Story Composition via Recurrent Neural Network

Guangyu Zhong, Yi-Hsuan Tsai, Sifei Liu et al.

In this paper, we propose a learning-based method to compose a video-story from a group of video clips that describe an activity or experience. We learn the coherence between video clips from real videos via the Recurrent Neural Network (RNN) that jointly incorporates the spatial-temporal semantics and motion dynamics to generate smooth and relevant compositions. We further rearrange the results generated by the RNN to make the overall video-story compatible with the storyline structure via a submodular ranking optimization process. Experimental results on the video-story dataset show that the proposed algorithm outperforms the state-of-the-art approach.

CVDec 25, 2017
Deep Blind Image Inpainting

Yang Liu, Jinshan Pan, Zhixun Su

Image inpainting is a challenging problem as it needs to fill the information of the corrupted regions. Most of the existing inpainting algorithms assume that the positions of the corrupted regions are known. Different from the existing methods that usually make some assumptions on the corrupted regions, we present an efficient blind image inpainting algorithm to directly restore a clear image from a corrupted input. Our algorithm is motivated by the residual learning algorithm which aims to learn the missing infor- mation in corrupted regions. However, directly using exist- ing residual learning algorithms in image restoration does not well solve this problem as little information is available in the corrupted regions. To solve this problem, we introduce an encoder and decoder architecture to capture more useful information and develop a robust loss function to deal with outliers. Our algorithm can predict the missing information in the corrupted regions, thus facilitating the clear image restoration. Both qualitative and quantitative experimental demonstrate that our algorithm can deal with the corrupted regions of arbitrary shapes and performs favorably against state-of-the-art methods.

CVFeb 28, 2017
An Optimization Framework with Flexible Inexact Inner Iterations for Nonconvex and Nonsmooth Programming

Yiyang Wang, Risheng Liu, Xiaoliang Song et al.

In recent years, numerous vision and learning tasks have been (re)formulated as nonconvex and nonsmooth programmings(NNPs). Although some algorithms have been proposed for particular problems, designing fast and flexible optimization schemes with theoretical guarantee is a challenging task for general NNPs. It has been investigated that performing inexact inner iterations often benefit to special applications case by case, but their convergence behaviors are still unclear. Motivated by these practical experiences, this paper designs a novel algorithmic framework, named inexact proximal alternating direction method (IPAD) for solving general NNPs. We demonstrate that any numerical algorithms can be incorporated into IPAD for solving subproblems and the convergence of the resulting hybrid schemes can be consistently guaranteed by a series of simple error conditions. Beyond the guarantee in theory, numerical experiments on both synthesized and real-world data further demonstrate the superiority and flexibility of our IPAD framework for practical use.

CVMay 28, 2016
Sparse Coding and Counting for Robust Visual Tracking

Risheng Liu, Jing Wang, Yiyang Wang et al.

In this paper, we propose a novel sparse coding and counting method under Bayesian framwork for visual tracking. In contrast to existing methods, the proposed method employs the combination of L0 and L1 norm to regularize the linear coefficients of incrementally updated linear basis. The sparsity constraint enables the tracker to effectively handle difficult challenges, such as occlusion or image corruption. To achieve realtime processing, we propose a fast and efficient numerical algorithm for solving the proposed model. Although it is an NP-hard problem, the proposed accelerated proximal gradient (APG) approach is guaranteed to converge to a solution quickly. Besides, we provide a closed solution of combining L0 and L1 regularized representation to obtain better sparsity. Experimental results on challenging video sequences demonstrate that the proposed method achieves state-of-the-art results both in accuracy and speed.

NAMar 24, 2015
Parallel matrix factorization for low-rank tensor completion

Yangyang Xu, Ruru Hao, Wotao Yin et al.

Higher-order low-rank tensors naturally arise in many applications including hyperspectral data recovery, video inpainting, seismic data recon- struction, and so on. We propose a new model to recover a low-rank tensor by simultaneously performing low-rank matrix factorizations to the all-mode ma- tricizations of the underlying tensor. An alternating minimization algorithm is applied to solve the model, along with two adaptive rank-adjusting strategies when the exact rank is not known. Phase transition plots reveal that our algorithm can recover a variety of synthetic low-rank tensors from significantly fewer samples than the compared methods, which include a matrix completion method applied to tensor recovery and two state-of-the-art tensor completion methods. Further tests on real- world data show similar advantages. Although our model is non-convex, our algorithm performs consistently throughout the tests and give better results than the compared methods, some of which are based on convex models. In addition, the global convergence of our algorithm can be established in the sense that the gradient of Lagrangian function converges to zero.

CVMar 9, 2012
Fixed-Rank Representation for Unsupervised Visual Learning

Risheng Liu, Zhouchen Lin, Fernando De la Torre et al.

Subspace clustering and feature extraction are two of the most commonly used unsupervised learning techniques in computer vision and pattern recognition. State-of-the-art techniques for subspace clustering make use of recent advances in sparsity and rank minimization. However, existing techniques are computationally expensive and may result in degenerate solutions that degrade clustering performance in the case of insufficient data sampling. To partially solve these problems, and inspired by existing work on matrix factorization, this paper proposes fixed-rank representation (FRR) as a unified framework for unsupervised visual learning. FRR is able to reveal the structure of multiple subspaces in closed-form when the data is noiseless. Furthermore, we prove that under some suitable conditions, even with insufficient observations, FRR can still reveal the true subspace memberships. To achieve robustness to outliers and noise, a sparse regularizer is introduced into the FRR framework. Beyond subspace clustering, FRR can be used for unsupervised feature extraction. As a non-trivial byproduct, a fast numerical solver is developed for FRR. Experimental results on both synthetic data and real applications validate our theoretical analysis and demonstrate the benefits of FRR for unsupervised visual learning.