48.3CVJun 2
Beyond Single Solution: Multi-Hypothesis Collaborative Deep Unfolding Network for Image Compressive SensingWenxue Cui, Hualin Li, Yuhang Qin et al.
Recent deep unfolding networks (DUNs) have advanced Compressive Sensing (CS) by effectively integrating iterative optimization with deep learning architectures. However, most CS approaches predominantly confine their inference to a single solution space, neglecting the inherent ill-posedness of CS problems that intrinsically permits multiple plausible candidate hypotheses. In this paper, a novel Multi-Hypothesis Collaborative Deep Unfolding CS Network (MHC-DUN) is proposed, which explicitly models and leverages multiple hypotheses by jointly optimizing across diverse solution spaces. Specifically, following the Proximal Gradient Descent algorithm, MHC-DUN jointly performs gradient descent and proximal mapping within this multi-hypothesis paradigm. i) For gradient descent, a well-designed AlphaNet is introduced to dynamically predict spatially varying step sizes for all hypotheses, enabling collaborative gradient updates across multiple solutions. ii) For proximal operator, a sophisticated multi-hypothesis collaborative proximal mapping module is designed, which leverages both intra-hypothesis and inter-hypothesis correlation priors to jointly refine multiple solutions. To enable end-to-end training, a novel composite loss function is designed, which balances measurement fidelity, hypothesis diversity, and reconstruction accuracy, encouraging exploration of complementary solutions while maintaining reconstruction fidelity. Experimental results reveal that the proposed CS method outperforms existing CS networks.
CVFeb 19, 2023Code
Guided Depth Map Super-resolution: A SurveyZhiwei Zhong, Xianming Liu, Junjun Jiang et al.
Guided depth map super-resolution (GDSR), which aims to reconstruct a high-resolution (HR) depth map from a low-resolution (LR) observation with the help of a paired HR color image, is a longstanding and fundamental problem, it has attracted considerable attention from computer vision and image processing communities. A myriad of novel and effective approaches have been proposed recently, especially with powerful deep learning techniques. This survey is an effort to present a comprehensive survey of recent progress in GDSR. We start by summarizing the problem of GDSR and explaining why it is challenging. Next, we introduce some commonly used datasets and image quality assessment methods. In addition, we roughly classify existing GDSR methods into three categories, i.e., filtering-based methods, prior-based methods, and learning-based methods. In each category, we introduce the general description of the published algorithms and design principles, summarize the representative methods, and discuss their highlights and limitations. Moreover, the depth related applications are introduced. Furthermore, we conduct experiments to evaluate the performance of some representative methods based on unified experimental configurations, so as to offer a systematic and fair performance evaluation to readers. Finally, we conclude this survey with possible directions and open problems for further research. All the related materials can be found at \url{https://github.com/zhwzhong/Guided-Depth-Map-Super-resolution-A-Survey}.
CVOct 29, 2023Code
Myriad: Large Multimodal Model by Applying Vision Experts for Industrial Anomaly DetectionYuanze Li, Haolin Wang, Shihao Yuan et al.
Due to the training configuration, traditional industrial anomaly detection (IAD) methods have to train a specific model for each deployment scenario, which is insufficient to meet the requirements of modern design and manufacturing. On the contrary, large multimodal models~(LMMs) have shown eminent generalization ability on various vision tasks, and their perception and comprehension capabilities imply the potential of applying LMMs on IAD tasks. However, we observe that even though the LMMs have abundant knowledge about industrial anomaly detection in the textual domain, the LMMs are unable to leverage the knowledge due to the modality gap between textual and visual domains. To stimulate the relevant knowledge in LMMs and adapt the LMMs towards anomaly detection tasks, we introduce existing IAD methods as vision experts and present a novel large multimodal model applying vision experts for industrial anomaly detection~(abbreviated to {Myriad}). Specifically, we utilize the anomaly map generated by the vision experts as guidance for LMMs, such that the vision model is guided to pay more attention to anomalous regions. Then, the visual features are modulated via an adapter to fit the anomaly detection tasks, which are fed into the language model together with the vision expert guidance and human instructions to generate the final outputs. Extensive experiments are applied on MVTec-AD, VisA, and PCB Bank benchmarks demonstrate that our proposed method not only performs favorably against state-of-the-art methods, but also inherits the flexibility and instruction-following ability of LMMs in the field of IAD. Source code and pre-trained models are publicly available at \url{https://github.com/tzjtatata/Myriad}.
CVAug 3, 2022
Fast Hierarchical Deep Unfolding Network for Image Compressed SensingWenxue Cui, Shaohui Liu, Debin Zhao
By integrating certain optimization solvers with deep neural network, deep unfolding network (DUN) has attracted much attention in recent years for image compressed sensing (CS). However, there still exist several issues in existing DUNs: 1) For each iteration, a simple stacked convolutional network is usually adopted, which apparently limits the expressiveness of these models. 2) Once the training is completed, most hyperparameters of existing DUNs are fixed for any input content, which significantly weakens their adaptability. In this paper, by unfolding the Fast Iterative Shrinkage-Thresholding Algorithm (FISTA), a novel fast hierarchical DUN, dubbed FHDUN, is proposed for image compressed sensing, in which a well-designed hierarchical unfolding architecture is developed to cooperatively explore richer contextual prior information in multi-scale spaces. To further enhance the adaptability, series of hyperparametric generation networks are developed in our framework to dynamically produce the corresponding optimal hyperparameters according to the input content. Furthermore, due to the accelerated policy in FISTA, the newly embedded acceleration module makes the proposed FHDUN save more than 50% of the iterative loops against recent DUNs. Extensive CS experiments manifest that the proposed FHDUN outperforms existing state-of-the-art CS methods, while maintaining fewer iterations.
CVOct 16, 2023
Deep Unfolding Network for Image Compressed Sensing by Content-adaptive Gradient Updating and Deformation-invariant Non-local ModelingWenxue Cui, Xiaopeng Fan, Jian Zhang et al.
Inspired by certain optimization solvers, the deep unfolding network (DUN) has attracted much attention in recent years for image compressed sensing (CS). However, there still exist the following two issues: 1) In existing DUNs, most hyperparameters are usually content independent, which greatly limits their adaptability for different input contents. 2) In each iteration, a plain convolutional neural network is usually adopted, which weakens the perception of wider context prior and therefore depresses the expressive ability. In this paper, inspired by the traditional Proximal Gradient Descent (PGD) algorithm, a novel DUN for image compressed sensing (dubbed DUN-CSNet) is proposed to solve the above two issues. Specifically, for the first issue, a novel content adaptive gradient descent network is proposed, in which a well-designed step size generation sub-network is developed to dynamically allocate the corresponding step sizes for different textures of input image by generating a content-aware step size map, realizing a content-adaptive gradient updating. For the second issue, considering the fact that many similar patches exist in an image but have undergone a deformation, a novel deformation-invariant non-local proximal mapping network is developed, which can adaptively build the long-range dependencies between the nonlocal patches by deformation-invariant non-local modeling, leading to a wider perception on context priors. Extensive experiments manifest that the proposed DUN-CSNet outperforms existing state-of-the-art CS methods by large margins.
AISep 9, 2022
Task-Agnostic Learning to Accomplish New TasksXianqi Zhang, Xingtao Wang, Xu Liu et al.
Reinforcement Learning (RL) and Imitation Learning (IL) have made great progress in robotic decision-making in recent years. However, these methods show obvious deterioration for new tasks that need to be completed through new combinations of actions. RL methods suffer from reward functions and distribution shifts, while IL methods are limited by expert demonstrations which do not cover new tasks. In contrast, humans can easily complete these tasks with the fragmented knowledge learned from task-agnostic experience. Inspired by this observation, this paper proposes a task-agnostic learning method (TAL for short) that can learn fragmented knowledge only from task-agnostic data to accomplish new tasks. TAL consists of four stages. First, the task-agnostic exploration is performed to collect data from interactions with the environment. The collected data is organized via a knowledge graph. Second, an action feature extractor is proposed and trained using the collected knowledge graph data for task-agnostic fragmented knowledge learning. Third, a candidate action generator is designed, which applies the action feature extractor on a new task to generate multiple candidate action sets. Finally, an action proposal network is designed to produce the probabilities for actions in a new task according to the environmental information. The probabilities are then used to generate order information for selecting actions to be executed from multiple candidate action sets to form the plan. Experiments on a virtual indoor scene show that the proposed method outperforms the state-of-the-art offline RL methods and IL methods by more than 20%.
CVNov 11, 2025Code
Perceptual Quality Assessment of 3D Gaussian Splatting: A Subjective Dataset and Prediction MetricZhaolin Wan, Yining Diao, Jingqi Xu et al.
With the rapid advancement of 3D visualization, 3D Gaussian Splatting (3DGS) has emerged as a leading technique for real-time, high-fidelity rendering. While prior research has emphasized algorithmic performance and visual fidelity, the perceptual quality of 3DGS-rendered content, especially under varying reconstruction conditions, remains largely underexplored. In practice, factors such as viewpoint sparsity, limited training iterations, point downsampling, noise, and color distortions can significantly degrade visual quality, yet their perceptual impact has not been systematically studied. To bridge this gap, we present 3DGS-QA, the first subjective quality assessment dataset for 3DGS. It comprises 225 degraded reconstructions across 15 object types, enabling a controlled investigation of common distortion factors. Based on this dataset, we introduce a no-reference quality prediction model that directly operates on native 3D Gaussian primitives, without requiring rendered images or ground-truth references. Our model extracts spatial and photometric cues from the Gaussian representation to estimate perceived quality in a structure-aware manner. We further benchmark existing quality assessment methods, spanning both traditional and learning-based approaches. Experimental results show that our method consistently achieves superior performance, highlighting its robustness and effectiveness for 3DGS content evaluation. The dataset and code are made publicly available at https://github.com/diaoyn/3DGSQA to facilitate future research in 3DGS quality assessment.
49.7CVApr 1
LG-HCC: Local Geometry-Aware Hierarchical Context Compression for 3D Gaussian SplattingXuan Deng, Xiandong Meng, Hengyu Man et al.
Although 3D Gaussian Splatting (3DGS) enables high-fidelity real-time rendering, its prohibitive storage overhead severely hinders practical deployment. Recent anchor-based 3DGS compression schemes reduce gaussian redundancy through some advanced context models. However, they overlook explicit geometric dependencies, leading to structural degradation and suboptimal ratedistortion performance. In this paper, we propose a Local Geometry-aware Hierarchical Context Compression framework for 3DGS(LG-HCC) that incorporates inter-anchor geometric correlations into anchor pruning and entropy coding for compact representation. Specifically, we introduce an Neighborhood-Aware Anchor Pruning (NAAP) strategy, which evaluates anchor importance via weighted neighborhood feature aggregation and then merges low-contribution anchors into salient neighbors, yielding a compact yet geometry-consistent anchor set. Moreover, we further develop a hierarchical entropy coding scheme, in which coarse-to-fine priors are exploited through a lightweight Geometry-Guided Convolution(GG-Conv) operator to enable spatially adaptive context modeling and rate-distortion optimization. Extensive experiments show that LG-HCC effectively alleviates structural preservation issues,achieving superior geometric integrity and rendering fidelity while reducing storage by up to 30.85x compared to the Scaffold-GS baseline on the Mip-NeRF360 dataset
CVNov 10, 2025
MRT: Learning Compact Representations with Mixed RWKV-Transformer for Extreme Image CompressionHan Liu, Hengyu Man, Xingtao Wang et al.
Recent advances in extreme image compression have revealed that mapping pixel data into highly compact latent representations can significantly improve coding efficiency. However, most existing methods compress images into 2-D latent spaces via convolutional neural networks (CNNs) or Swin Transformers, which tend to retain substantial spatial redundancy, thereby limiting overall compression performance. In this paper, we propose a novel Mixed RWKV-Transformer (MRT) architecture that encodes images into more compact 1-D latent representations by synergistically integrating the complementary strengths of linear-attention-based RWKV and self-attention-based Transformer models. Specifically, MRT partitions each image into fixed-size windows, utilizing RWKV modules to capture global dependencies across windows and Transformer blocks to model local redundancies within each window. The hierarchical attention mechanism enables more efficient and compact representation learning in the 1-D domain. To further enhance compression efficiency, we introduce a dedicated RWKV Compression Model (RCM) tailored to the structure characteristics of the intermediate 1-D latent features in MRT. Extensive experiments on standard image compression benchmarks validate the effectiveness of our approach. The proposed MRT framework consistently achieves superior reconstruction quality at bitrates below 0.02 bits per pixel (bpp). Quantitative results based on the DISTS metric show that MRT significantly outperforms the state-of-the-art 2-D architecture GLC, achieving bitrate savings of 43.75%, 30.59% on the Kodak and CLIC2020 test datasets, respectively.
58.8CVMay 12
PairDropGS: Paired Dropout-Induced Consistency Regularization for Sparse-View Gaussian SplattingHantang Li, Qiang Zhu, Xiandong Meng et al.
Dropout-based sparse-view 3D Gaussian Splatting (3DGS) methods alleviate overfitting by randomly suppressing Gaussian primitives during training. Existing methods mainly focus on designing increasingly sophisticated dropout strategies, while they overlook the resulting inconsistencies among different dropped Gaussian subsets. This oversight often leads to unstable reconstruction and suboptimal Gaussian representation learning.In this paper, we revisit dropout-based sparse-view 3DGS from a consistency regularization perspective and propose PairDropGS, a Paired Dropout-induced Consistency Regularization framework for sparse-view Gaussian splatting. Specifically, PairDropGS first constructs a pair of the dropped Gaussian subsets from a shared Gaussian field and designs a low-frequency consistency regularization to constrain their low-frequency rendered structures. This design encourages the shared Gaussian field to preserve stable scene layout and coarse geometry under different random dropouts, while avoiding excessive constraints on ambiguous high-frequency details. Moreover, we introduce a progressive consistency scheduling strategy to gradually strengthen the consistency regularization during training for stability and robustness of reconstruction. Extensive experiments on widely-used sparse-view benchmarks demonstrate that PairDropGS achieves superior training stability, significantly outperforms existing dropout-based 3DGS methods in reconstruction quality, while exhibiting the simplicity and plug-and-play nature for improving dropout-based optimization.
CVAug 17, 2025Code
Region-Level Context-Aware Multimodal UnderstandingHongliang Wei, Xianqi Zhang, Xingtao Wang et al.
Despite significant progress, existing research on Multimodal Large Language Models (MLLMs) mainly focuses on general visual understanding, overlooking the ability to integrate textual context associated with objects for a more context-aware multimodal understanding -- an ability we refer to as Region-level Context-aware Multimodal Understanding (RCMU). To address this limitation, we first formulate the RCMU task, which requires models to respond to user instructions by integrating both image content and textual information of regions or objects. To equip MLLMs with RCMU capabilities, we propose Region-level Context-aware Visual Instruction Tuning (RCVIT), which incorporates object information into the model input and enables the model to utilize bounding box coordinates to effectively associate objects' visual content with their textual information. To address the lack of datasets, we introduce the RCMU dataset, a large-scale visual instruction tuning dataset that covers multiple RCMU tasks. We also propose RC\&P-Bench, a comprehensive benchmark that can evaluate the performance of MLLMs in RCMU and multimodal personalized understanding tasks. Additionally, we propose a reference-free evaluation metric to perform a comprehensive and fine-grained evaluation of the region-level context-aware image descriptions. By performing RCVIT on Qwen2-VL models with the RCMU dataset, we developed RC-Qwen2-VL models. Experimental results indicate that RC-Qwen2-VL models not only achieve outstanding performance on multiple RCMU tasks but also demonstrate successful applications in multimodal RAG and personalized conversation. Our data, model and benchmark are available at https://github.com/hongliang-wei/RC-MLLM
65.3CVApr 8
DOC-GS: Dual-Domain Observation and Calibration for Reliable Sparse-View Gaussian SplattingHantang Li, Qiang Zhu, Xiandong Meng et al.
Sparse-view reconstruction with 3D Gaussian Splatting (3DGS) is fundamentally ill-posed due to insufficient geometric supervision, often leading to severe overfitting and the emergence of structural distortions and translucent haze-like artifacts. While existing approaches attempt to alleviate this issue via dropout-based regularization, they are largely heuristic and lack a unified understanding of artifact formation. In this paper, we revisit sparse-view 3DGS reconstruction from a new perspective and identify the core challenge as the unobservability of Gaussian primitive reliability. Unreliable Gaussians are insufficiently constrained during optimization and accumulate as haze-like degradations in rendered images. Motivated by this observation, we propose a unified Dual-domain Observation and Calibration (DOC-GS) framework that models and corrects Gaussian reliability through the synergy of optimization-domain inductive bias and observation-domain evidence. Specifically, in the optimization domain, we characterize Gaussian reliability by the degree to which each primitive is constrained during training, and instantiate this signal via a Continuous Depth-Guided Dropout (CDGD) strategy, where the dropout probability serves as an explicit proxy for primitive reliability. This imposes a smooth depth-aware inductive bias to suppress weakly constrained Gaussians and improve optimization stability. In the observation domain, we establish a connection between floater artifacts and atmospheric scattering, and leverage the Dark Channel Prior (DCP) as a structural consistency cue to identify and accumulate anomalous regions. Based on cross-view aggregated evidence, we further design a reliability-driven geometric pruning strategy to remove low-confidence Gaussians.
84.5CVApr 5
NTIRE 2026 3D Restoration and Reconstruction in Real-world Adverse Conditions: RealX3D Challenge ResultsShuhong Liu, Chenyu Bao, Ziteng Cui et al.
This paper presents a comprehensive review of the NTIRE 2026 3D Restoration and Reconstruction (3DRR) Challenge, detailing the proposed methods and results. The challenge seeks to identify robust reconstruction pipelines that are robust under real-world adverse conditions, specifically extreme low-light and smoke-degraded environments, as captured by our RealX3D benchmark. A total of 279 participants registered for the competition, of whom 33 teams submitted valid results. We thoroughly evaluate the submitted approaches against state-of-the-art baselines, revealing significant progress in 3D reconstruction under adverse conditions. Our analysis highlights shared design principles among top-performing methods and provides insights into effective strategies for handling 3D scene degradation.
IVFeb 29, 2024
Deep Network for Image Compressed Sensing Coding Using Local Structural SamplingWenxue Cui, Xingtao Wang, Xiaopeng Fan et al.
Existing image compressed sensing (CS) coding frameworks usually solve an inverse problem based on measurement coding and optimization-based image reconstruction, which still exist the following two challenges: 1) The widely used random sampling matrix, such as the Gaussian Random Matrix (GRM), usually leads to low measurement coding efficiency. 2) The optimization-based reconstruction methods generally maintain a much higher computational complexity. In this paper, we propose a new CNN based image CS coding framework using local structural sampling (dubbed CSCNet) that includes three functional modules: local structural sampling, measurement coding and Laplacian pyramid reconstruction. In the proposed framework, instead of GRM, a new local structural sampling matrix is first developed, which is able to enhance the correlation between the measurements through a local perceptual sampling strategy. Besides, the designed local structural sampling matrix can be jointly optimized with the other functional modules during training process. After sampling, the measurements with high correlations are produced, which are then coded into final bitstreams by the third-party image codec. At last, a Laplacian pyramid reconstruction network is proposed to efficiently recover the target image from the measurement domain to the image domain. Extensive experimental results demonstrate that the proposed scheme outperforms the existing state-of-the-art CS coding methods, while maintaining fast computational speed.
35.7CVApr 2
MTLSI-Net: A Linear Semantic Interaction Network for Parameter-Efficient Multi-Task Dense PredictionChen Liu, Hengyu Man, Xiaopeng Fan et al.
Multi-task dense prediction aims to perform multiple pixel-level tasks simultaneously. However, capturing global cross-task interactions remains non-trivial due to the quadratic complexity of standard self-attention on high-resolution features. To address this limitation, we propose a Multi-Task Linear Semantic Interaction Network (MTLSI-Net), which facilitates cross-task interaction through linear attention. Specifically, MTLSI-Net incorporates three key components: a Multi-Task Multi-scale Query Linear Fusion Block, which captures cross-task dependencies across multiple scales with linear complexity using a shared global context matrix; a Semantic Token Distiller that compresses redundant features into compact semantic tokens, distilling essential cross-task knowledge; and a Cross-Window Integrated attention Block that injects global semantics into local features via a dual-branch architecture, preserving both global consistency and spatial precision. These components collectively enable the network to capture comprehensive cross-task interactions at linear complexity with reduced parameters. Extensive experiments on NYUDv2 and PASCAL-Context demonstrate that MTLSI-Net achieves state-of-the-art performance, validating its effectiveness and efficiency in multi-task learning.
CVJul 10, 2025
T-GVC: Trajectory-Guided Generative Video Coding at Ultra-Low BitratesZhitao Wang, Hengyu Man, Wenrui Li et al.
Recent advances in video generation techniques have given rise to an emerging paradigm of generative video coding for Ultra-Low Bitrate (ULB) scenarios by leveraging powerful generative priors. However, most existing methods are limited by domain specificity (e.g., facial or human videos) or excessive dependence on high-level text guidance, which tend to inadequately capture fine-grained motion details, leading to unrealistic or incoherent reconstructions. To address these challenges, we propose Trajectory-Guided Generative Video Coding (dubbed T-GVC), a novel framework that bridges low-level motion tracking with high-level semantic understanding. T-GVC features a semantic-aware sparse motion sampling pipeline that extracts pixel-wise motion as sparse trajectory points based on their semantic importance, significantly reducing the bitrate while preserving critical temporal semantic information. In addition, by integrating trajectory-aligned loss constraints into diffusion processes, we introduce a training-free guidance mechanism in latent space to ensure physically plausible motion patterns without sacrificing the inherent capabilities of generative models. Experimental results demonstrate that T-GVC outperforms both traditional and neural video codecs under ULB conditions. Furthermore, additional experiments confirm that our framework achieves more precise motion control than existing text-guided methods, paving the way for a novel direction of generative video coding guided by geometric motion modeling.
ROMar 28, 2025
FLAM: Foundation Model-Based Body Stabilization for Humanoid Locomotion and ManipulationXianqi Zhang, Hongliang Wei, Wenrui Wang et al.
Humanoid robots have attracted significant attention in recent years. Reinforcement Learning (RL) is one of the main ways to control the whole body of humanoid robots. RL enables agents to complete tasks by learning from environment interactions, guided by task rewards. However, existing RL methods rarely explicitly consider the impact of body stability on humanoid locomotion and manipulation. Achieving high performance in whole-body control remains a challenge for RL methods that rely solely on task rewards. In this paper, we propose a Foundation model-based method for humanoid Locomotion And Manipulation (FLAM for short). FLAM integrates a stabilizing reward function with a basic policy. The stabilizing reward function is designed to encourage the robot to learn stable postures, thereby accelerating the learning process and facilitating task completion. Specifically, the robot pose is first mapped to the 3D virtual human model. Then, the human pose is stabilized and reconstructed through a human motion reconstruction model. Finally, the pose before and after reconstruction is used to compute the stabilizing reward. By combining this stabilizing reward with the task reward, FLAM effectively guides policy learning. Experimental results on a humanoid robot benchmark demonstrate that FLAM outperforms state-of-the-art RL methods, highlighting its effectiveness in improving stability and overall performance.
CVSep 18, 2025
Bidirectional Feature-aligned Motion Transformation for Efficient Dynamic Point Cloud CompressionXuan Deng, Xingtao Wang, Xiandong Meng et al.
Efficient dynamic point cloud compression (DPCC) critically depends on accurate motion estimation and compensation. However, the inherently irregular structure and substantial local variations of point clouds make this task highly challenging. Existing approaches typically rely on explicit motion estimation, whose encoded motion vectors often fail to capture complex dynamics and inadequately exploit temporal correlations. To address these limitations, we propose a Bidirectional Feature-aligned Motion Transformation (Bi-FMT) framework that implicitly models motion in the feature space. Bi-FMT aligns features across both past and future frames to produce temporally consistent latent representations, which serve as predictive context in a conditional coding pipeline, forming a unified ``Motion + Conditional'' representation. Built upon this bidirectional feature alignment, we introduce a Cross-Transformer Refinement module (CTR) at the decoder side to adaptively refine locally aligned features. By modeling cross-frame dependencies with vector attention, CRT enhances local consistency and restores fine-grained spatial details that are often lost during motion alignment. Moreover, we design a Random Access (RA) reference strategy that treats the bidirectionally aligned features as conditional context, enabling frame-level parallel compression and eliminating the sequential encoding. Extensive experiments demonstrate that Bi-FMT surpasses D-DPCC and AdaDPCC in both compression efficiency and runtime, achieving BD-Rate reductions of 20% (D1) and 9.4% (D1), respectively.
CVSep 1, 2025
PVINet: Point-Voxel Interlaced Network for Point Cloud CompressionXuan Deng, Xingtao Wang, Xiandong Meng et al.
In point cloud compression, the quality of a reconstructed point cloud relies on both the global structure and the local context, with existing methods usually processing global and local information sequentially and lacking communication between these two types of information. In this paper, we propose a point-voxel interlaced network (PVINet), which captures global structural features and local contextual features in parallel and performs interactions at each scale to enhance feature perception efficiency. Specifically, PVINet contains a voxel-based encoder (Ev) for extracting global structural features and a point-based encoder (Ep) that models local contexts centered at each voxel. Particularly, a novel conditional sparse convolution is introduced, which applies point embeddings to dynamically customize kernels for voxel feature extraction, facilitating feature interactions from Ep to Ev. During decoding, a voxel-based decoder employs conditional sparse convolutions to incorporate point embeddings as guidance to reconstruct the point cloud. Experiments on benchmark datasets show that PVINet delivers competitive performance compared to state-of-the-art methods.
CVDec 13, 2021
Deep Attentional Guided Image FilteringZhiwei Zhong, Xianming Liu, Junjun Jiang et al.
Guided filter is a fundamental tool in computer vision and computer graphics which aims to transfer structure information from guidance image to target image. Most existing methods construct filter kernels from the guidance itself without considering the mutual dependency between the guidance and the target. However, since there typically exist significantly different edges in the two images, simply transferring all structural information of the guidance to the target would result in various artifacts. To cope with this problem, we propose an effective framework named deep attentional guided image filtering, the filtering process of which can fully integrate the complementary information contained in both images. Specifically, we propose an attentional kernel learning module to generate dual sets of filter kernels from the guidance and the target, respectively, and then adaptively combine them by modeling the pixel-wise dependency between the two images. Meanwhile, we propose a multi-scale guided image filtering module to progressively generate the filtering result with the constructed kernels in a coarse-to-fine manner. Correspondingly, a multi-scale fusion strategy is introduced to reuse the intermediate results in the coarse-to-fine process. Extensive experiments show that the proposed framework compares favorably with the state-of-the-art methods in a wide range of guided image filtering applications, such as guided super-resolution, cross-modality restoration, texture removal, and semantic segmentation.
IVDec 7, 2021
Image Compressed Sensing Using Non-local Neural NetworkWenxue Cui, Shaohui Liu, Feng Jiang et al.
Deep network-based image Compressed Sensing (CS) has attracted much attention in recent years. However, the existing deep network-based CS schemes either reconstruct the target image in a block-by-block manner that leads to serious block artifacts or train the deep network as a black box that brings about limited insights of image prior knowledge. In this paper, a novel image CS framework using non-local neural network (NL-CSNet) is proposed, which utilizes the non-local self-similarity priors with deep network to improve the reconstruction quality. In the proposed NL-CSNet, two non-local subnetworks are constructed for utilizing the non-local self-similarity priors in the measurement domain and the multi-scale feature domain respectively. Specifically, in the subnetwork of measurement domain, the long-distance dependencies between the measurements of different image blocks are established for better initial reconstruction. Analogically, in the subnetwork of multi-scale feature domain, the affinities between the dense feature representations are explored in the multi-scale space for deep reconstruction. Furthermore, a novel loss function is developed to enhance the coupling between the non-local representations, which also enables an end-to-end training of NL-CSNet. Extensive experiments manifest that NL-CSNet outperforms existing state-of-the-art CS methods, while maintaining fast computational speed.
MMJun 10, 2021
Tree-Structured Data Clustering-Driven Neural Network for Intra Prediction in Video CodingHengyu Man, Xiaopeng Fan, Ruiqin Xiong et al.
As a crucial part of video compression, intra prediction utilizes local information of images to eliminate the redundancy in spatial domain. In both the High Efficiency Video Coding (H.265/HEVC) and Versatile Video Coding (H.266/VVC), multiple directional prediction modes are employed to find the texture trend of each small block and then the prediction is made based on reference samples in the selected direction. Recently, the intra prediction schemes based on neural networks have achieved great success. In these methods, the networks are trained and applied to intra prediction to assist the directional prediction modes. In this paper, we propose a novel tree-structured data clustering-driven neural network (dubbed TreeNet) for intra prediction, which builds the networks and clusters the training data in a tree-structured manner. Specifically, in each network split and training process of TreeNet, every parent network on a leaf node is split into two child networks by adding or subtracting Gaussian random noise. Then a data clustering-driven training is applied to train the two derived child networks using the clustered training data of their parent. To test the performance, TreeNet is integrated into VVC and HEVC to combine with or replace the directional prediction modes. In addition, a fast termination strategy is proposed to accelerate the search of TreeNet. The experimental results demonstrate that TreeNet with the fast termination can reach an average of 2.8% Bjontegaard distortion rate (BD-rate) improvement (up to 8.1%) and 4.9% BD-rate improvement (up to 8.2%) over VVC (VTM-4.0) and HEVC (HM-16.9) with all intra configuration, respectively.
CVApr 4, 2021
High-resolution Depth Maps Imaging via Attention-based Hierarchical Multi-modal FusionZhiwei Zhong, Xianming Liu, Junjun Jiang et al.
Depth map records distance between the viewpoint and objects in the scene, which plays a critical role in many real-world applications. However, depth map captured by consumer-grade RGB-D cameras suffers from low spatial resolution. Guided depth map super-resolution (DSR) is a popular approach to address this problem, which attempts to restore a high-resolution (HR) depth map from the input low-resolution (LR) depth and its coupled HR RGB image that serves as the guidance. The most challenging problems for guided DSR are how to correctly select consistent structures and propagate them, and properly handle inconsistent ones. In this paper, we propose a novel attention-based hierarchical multi-modal fusion (AHMF) network for guided DSR. Specifically, to effectively extract and combine relevant information from LR depth and HR guidance, we propose a multi-modal attention based fusion (MMAF) strategy for hierarchical convolutional layers, including a feature enhance block to select valuable features and a feature recalibration block to unify the similarity metrics of modalities with different appearance characteristics. Furthermore, we propose a bi-directional hierarchical feature collaboration (BHFC) module to fully leverage low-level spatial information and high-level structure information among multi-scale features. Experimental results show that our approach outperforms state-of-the-art methods in terms of reconstruction accuracy, running speed and memory efficiency.
CVJan 6, 2021
Multi-Stage Residual Hiding for Image-into-Audio SteganographyWenxue Cui, Shaohui Liu, Feng Jiang et al.
The widespread application of audio communication technologies has speeded up audio data flowing across the Internet, which made it a popular carrier for covert communication. In this paper, we present a cross-modal steganography method for hiding image content into audio carriers while preserving the perceptual fidelity of the cover audio. In our framework, two multi-stage networks are designed: the first network encodes the decreasing multilevel residual errors inside different audio subsequences with the corresponding stage sub-networks, while the second network decodes the residual errors from the modified carrier with the corresponding stage sub-networks to produce the final revealed results. The multi-stage design of proposed framework not only make the controlling of payload capacity more flexible, but also make hiding easier because of the gradual sparse characteristic of residual errors. Qualitative experiments suggest that modifications to the carrier are unnoticeable by human listeners and that the decoded images are highly intelligible.
CVAug 17, 2018
Convolutional Neural Networks based Intra Prediction for HEVCWenxue Cui, Tao Zhang, Shengping Zhang et al.
Traditional intra prediction methods for HEVC rely on using the nearest reference lines for predicting a block, which ignore much richer context between the current block and its neighboring blocks and therefore cause inaccurate prediction especially when weak spatial correlation exists between the current block and the reference lines. To overcome this problem, in this paper, an intra prediction convolutional neural network (IPCNN) is proposed for intra prediction, which exploits the rich context of the current block and therefore is capable of improving the accuracy of predicting the current block. Meanwhile, the predictions of the three nearest blocks can also be refined. To the best of our knowledge, this is the first paper that directly applies CNNs to intra prediction for HEVC. Experimental results validate the effectiveness of applying CNNs to intra prediction and achieved significant performance improvement compared to traditional intra prediction methods.
CVJun 19, 2018
Deep neural network based sparse measurement matrix for image compressed sensingWenxue Cui, Feng Jiang, Xinwei Gao et al.
Gaussian random matrix (GRM) has been widely used to generate linear measurements in compressed sensing (CS) of natural images. However, there actually exist two disadvantages with GRM in practice. One is that GRM has large memory requirement and high computational complexity, which restrict the applications of CS. Another is that the CS measurements randomly obtained by GRM cannot provide sufficient reconstruction performances. In this paper, a Deep neural network based Sparse Measurement Matrix (DSMM) is learned by the proposed convolutional network to reduce the sampling computational complexity and improve the CS reconstruction performance. Two sub networks are included in the proposed network, which are the sampling sub-network and the reconstruction sub-network. In the sampling sub-network, the sparsity and the normalization are both considered by the limitation of the storage and the computational complexity. In order to improve the CS reconstruction performance, a reconstruction sub-network are introduced to help enhance the sampling sub-network. So by the offline iterative training of the proposed end-to-end network, the DSMM is generated for accurate measurement and excellent reconstruction. Experimental results demonstrate that the proposed DSMM outperforms GRM greatly on representative CS reconstruction methods
CVApr 13, 2018
An efficient deep convolutional laplacian pyramid architecture for CS reconstruction at low sampling ratiosWenxue Cui, Heyao Xu, Xinwei Gao et al.
The compressed sensing (CS) has been successfully applied to image compression in the past few years as most image signals are sparse in a certain domain. Several CS reconstruction models have been proposed and obtained superior performance. However, these methods suffer from blocking artifacts or ringing effects at low sampling ratios in most cases. To address this problem, we propose a deep convolutional Laplacian Pyramid Compressed Sensing Network (LapCSNet) for CS, which consists of a sampling sub-network and a reconstruction sub-network. In the sampling sub-network, we utilize a convolutional layer to mimic the sampling operator. In contrast to the fixed sampling matrices used in traditional CS methods, the filters used in our convolutional layer are jointly optimized with the reconstruction sub-network. In the reconstruction sub-network, two branches are designed to reconstruct multi-scale residual images and muti-scale target images progressively using a Laplacian pyramid architecture. The proposed LapCSNet not only integrates multi-scale information to achieve better performance but also reduces computational cost dramatically. Experimental results on benchmark datasets demonstrate that the proposed method is capable of reconstructing more details and sharper edges against the state-of-the-arts methods.
MMMar 15, 2018
A nonlinear transform based analog video transmission frameworkYongtao Liu, Xiaopeng Fan, Yang Wang et al.
Soft-cast, a cross-layer design for wireless video transmission, is proposed to solve the drawbacks of digital video transmission: threshold transmission framework achieving the same effect. Specifically, in encoder, we carry out power allocation on the transformed coefficients and encode the coefficients based on the new formulation of power distortion. In decoder, the process of LLSE estimator is also improved. Accompanied with the inverse nonlinear transform, DCT coefficients can be recovered depending on the scaling factors , LLSE estimator coefficients and metadata. Experiment results show that our proposed framework outperforms the Soft-cast in PSNR 1.08 dB and the MSSIM gain reaches to 2.35% when transmitting under the same bandwidth and total power.
CVAug 2, 2017
An End-to-End Compression Framework Based on Convolutional Neural NetworksFeng Jiang, Wen Tao, Shaohui Liu et al.
Deep learning, e.g., convolutional neural networks (CNNs), has achieved great success in image processing and computer vision especially in high level vision applications such as recognition and understanding. However, it is rarely used to solve low-level vision problems such as image compression studied in this paper. Here, we move forward a step and propose a novel compression framework based on CNNs. To achieve high-quality image compression at low bit rates, two CNNs are seamlessly integrated into an end-to-end compression framework. The first CNN, named compact convolutional neural network (ComCNN), learns an optimal compact representation from an input image, which preserves the structural information and is then encoded using an image codec (e.g., JPEG, JPEG2000 or BPG). The second CNN, named reconstruction convolutional neural network (RecCNN), is used to reconstruct the decoded image with high-quality in the decoding end. To make two CNNs effectively collaborate, we develop a unified end-to-end learning algorithm to simultaneously learn ComCNN and RecCNN, which facilitates the accurate reconstruction of the decoded image using RecCNN. Such a design also makes the proposed compression framework compatible with existing image coding standards. Experimental results validate that the proposed compression framework greatly outperforms several compression frameworks that use existing image coding standards with state-of-the-art deblocking or denoising post-processing methods.
CVJul 22, 2017
Single Image Super-Resolution with Dilated Convolution based Multi-Scale Information Learning Inception ModuleWuzhen Shi, Feng Jiang, Debin Zhao
Traditional works have shown that patches in a natural image tend to redundantly recur many times inside the image, both within the same scale, as well as across different scales. Make full use of these multi-scale information can improve the image restoration performance. However, the current proposed deep learning based restoration methods do not take the multi-scale information into account. In this paper, we propose a dilated convolution based inception module to learn multi-scale information and design a deep network for single image super-resolution. Different dilated convolution learns different scale feature, then the inception module concatenates all these features to fuse multi-scale information. In order to increase the reception field of our network to catch more contextual information, we cascade multiple inception modules to constitute a deep network to conduct single image super-resolution. With the novel dilated convolution based inception module, the proposed end-to-end single image super-resolution network can take advantage of multi-scale information to improve image super-resolution performance. Experimental results show that our proposed method outperforms many state-of-the-art single image super-resolution methods.
CVJul 22, 2017
Deep Networks for Compressed Image SensingWuzhen Shi, Feng Jiang, Shengping Zhang et al.
The compressed sensing (CS) theory has been successfully applied to image compression in the past few years as most image signals are sparse in a certain domain. Several CS reconstruction models have been recently proposed and obtained superior performance. However, there still exist two important challenges within the CS theory. The first one is how to design a sampling mechanism to achieve an optimal sampling efficiency, and the second one is how to perform the reconstruction to get the highest quality to achieve an optimal signal recovery. In this paper, we try to deal with these two problems with a deep network. First of all, we train a sampling matrix via the network training instead of using a traditional manually designed one, which is much appropriate for our deep network based reconstruct process. Then, we propose a deep network to recover the image, which imitates traditional compressed sensing reconstruction processes. Experimental results demonstrate that our deep networks based CS reconstruction method offers a very significant quality improvement compared against state of the art ones.
CVMar 30, 2017
Learning Convolutional Networks for Content-weighted Image CompressionMu Li, Wangmeng Zuo, Shuhang Gu et al.
Lossy image compression is generally formulated as a joint rate-distortion optimization to learn encoder, quantizer, and decoder. However, the quantizer is non-differentiable, and discrete entropy estimation usually is required for rate control. These make it very challenging to develop a convolutional network (CNN)-based image compression system. In this paper, motivated by that the local information content is spatially variant in an image, we suggest that the bit rate of the different parts of the image should be adapted to local content. And the content aware bit rate is allocated under the guidance of a content-weighted importance map. Thus, the sum of the importance map can serve as a continuous alternative of discrete entropy estimation to control compression rate. And binarizer is adopted to quantize the output of encoder due to the binarization scheme is also directly defined by the importance map. Furthermore, a proxy function is introduced for binary operation in backward propagation to make it differentiable. Therefore, the encoder, decoder, binarizer and importance map can be jointly optimized in an end-to-end manner by using a subset of the ImageNet database. In low bit rate image compression, experiments show that our system significantly outperforms JPEG and JPEG 2000 by structural similarity (SSIM) index, and can produce the much better visual result with sharp edges, rich textures, and fewer artifacts.
CVJul 7, 2016
Random Walk Graph Laplacian based Smoothness Prior for Soft Decoding of JPEG ImagesXianming Liu, Gene Cheung, Xiaolin Wu et al.
Given the prevalence of JPEG compressed images, optimizing image reconstruction from the compressed format remains an important problem. Instead of simply reconstructing a pixel block from the centers of indexed DCT coefficient quantization bins (hard decoding), soft decoding reconstructs a block by selecting appropriate coefficient values within the indexed bins with the help of signal priors. The challenge thus lies in how to define suitable priors and apply them effectively. In this paper, we combine three image priors---Laplacian prior for DCT coefficients, sparsity prior and graph-signal smoothness prior for image patches---to construct an efficient JPEG soft decoding algorithm. Specifically, we first use the Laplacian prior to compute a minimum mean square error (MMSE) initial solution for each code block. Next, we show that while the sparsity prior can reduce block artifacts, limiting the size of the over-complete dictionary (to lower computation) would lead to poor recovery of high DCT frequencies. To alleviate this problem, we design a new graph-signal smoothness prior (desired signal has mainly low graph frequencies) based on the left eigenvectors of the random walk graph Laplacian matrix (LERaG). Compared to previous graph-signal smoothness priors, LERaG has desirable image filtering properties with low computation overhead. We demonstrate how LERaG can facilitate recovery of high DCT frequencies of a piecewise smooth (PWS) signal via an interpretation of low graph frequency components as relaxed solutions to normalized cut in spectral clustering. Finally, we construct a soft decoding algorithm using the three signal priors with appropriate prior weights. Experimental results show that our proposal outperforms state-of-the-art soft decoding algorithms in both objective and subjective evaluations noticeably.
CVMay 14, 2014
Group-based Sparse Representation for Image RestorationJian Zhang, Debin Zhao, Wen Gao
Traditional patch-based sparse representation modeling of natural images usually suffer from two problems. First, it has to solve a large-scale optimization problem with high computational complexity in dictionary learning. Second, each patch is considered independently in dictionary learning and sparse coding, which ignores the relationship among patches, resulting in inaccurate sparse coding coefficients. In this paper, instead of using patch as the basic unit of sparse representation, we exploit the concept of group as the basic unit of sparse representation, which is composed of nonlocal patches with similar structures, and establish a novel sparse representation modeling of natural images, called group-based sparse representation (GSR). The proposed GSR is able to sparsely represent natural images in the domain of group, which enforces the intrinsic local sparsity and nonlocal self-similarity of images simultaneously in a unified framework. Moreover, an effective self-adaptive dictionary learning method for each group with low complexity is designed, rather than dictionary learning from natural images. To make GSR tractable and robust, a split Bregman based technique is developed to solve the proposed GSR-driven minimization problem for image restoration efficiently. Extensive experiments on image inpainting, image deblurring and image compressive sensing recovery manifest that the proposed GSR modeling outperforms many current state-of-the-art schemes in both PSNR and visual perception.
MMMay 11, 2014
Image Restoration Using Joint Statistical Modeling in Space-Transform DomainJian Zhang, Debin Zhao, Ruiqin Xiong et al.
This paper presents a novel strategy for high-fidelity image restoration by characterizing both local smoothness and nonlocal self-similarity of natural images in a unified statistical manner. The main contributions are three-folds. First, from the perspective of image statistics, a joint statistical modeling (JSM) in an adaptive hybrid space-transform domain is established, which offers a powerful mechanism of combining local smoothness and nonlocal self-similarity simultaneously to ensure a more reliable and robust estimation. Second, a new form of minimization functional for solving image inverse problem is formulated using JSM under regularization-based framework. Finally, in order to make JSM tractable and robust, a new Split-Bregman based algorithm is developed to efficiently solve the above severely underdetermined inverse problem associated with theoretical proof of convergence. Extensive experiments on image inpainting, image deblurring and mixed Gaussian plus salt-and-pepper noise removal applications verify the effectiveness of the proposed algorithm.
CVApr 30, 2014
Image Compressive Sensing Recovery Using Adaptively Learned Sparsifying Basis via L0 MinimizationJian Zhang, Chen Zhao, Debin Zhao et al.
From many fewer acquired measurements than suggested by the Nyquist sampling theory, compressive sensing (CS) theory demonstrates that, a signal can be reconstructed with high probability when it exhibits sparsity in some domain. Most of the conventional CS recovery approaches, however, exploited a set of fixed bases (e.g. DCT, wavelet and gradient domain) for the entirety of a signal, which are irrespective of the non-stationarity of natural signals and cannot achieve high enough degree of sparsity, thus resulting in poor CS recovery performance. In this paper, we propose a new framework for image compressive sensing recovery using adaptively learned sparsifying basis via L0 minimization. The intrinsic sparsity of natural images is enforced substantially by sparsely representing overlapped image patches using the adaptively learned sparsifying basis in the form of L0 norm, greatly reducing blocking artifacts and confining the CS solution space. To make our proposed scheme tractable and robust, a split Bregman iteration based technique is developed to solve the non-convex L0 minimization problem efficiently. Experimental results on a wide range of natural images for CS recovery have shown that our proposed algorithm achieves significant performance improvements over many current state-of-the-art schemes and exhibits good convergence property.
CVApr 29, 2014
Structural Group Sparse Representation for Image Compressive Sensing RecoveryJian Zhang, Debin Zhao, Feng Jiang et al.
Compressive Sensing (CS) theory shows that a signal can be decoded from many fewer measurements than suggested by the Nyquist sampling theory, when the signal is sparse in some domain. Most of conventional CS recovery approaches, however, exploited a set of fixed bases (e.g. DCT, wavelet, contourlet and gradient domain) for the entirety of a signal, which are irrespective of the nonstationarity of natural signals and cannot achieve high enough degree of sparsity, thus resulting in poor rate-distortion performance. In this paper, we propose a new framework for image compressive sensing recovery via structural group sparse representation (SGSR) modeling, which enforces image sparsity and self-similarity simultaneously under a unified framework in an adaptive group domain, thus greatly confining the CS solution space. In addition, an efficient iterative shrinkage/thresholding algorithm based technique is developed to solve the above optimization problem. Experimental results demonstrate that the novel CS recovery strategy achieves significant performance improvements over the current state-of-the-art schemes and exhibits nice convergence.
CVApr 29, 2014
Spatially Directional Predictive Coding for Block-based Compressive Sensing of Natural ImagesJian Zhang, Debin Zhao, Feng Jiang
A novel coding strategy for block-based compressive sens-ing named spatially directional predictive coding (SDPC) is proposed, which efficiently utilizes the intrinsic spatial cor-relation of natural images. At the encoder, for each block of compressive sensing (CS) measurements, the optimal pre-diction is selected from a set of prediction candidates that are generated by four designed directional predictive modes. Then, the resulting residual is processed by scalar quantiza-tion (SQ). At the decoder, the same prediction is added onto the de-quantized residuals to produce the quantized CS measurements, which is exploited for CS reconstruction. Experimental results substantiate significant improvements achieved by SDPC-plus-SQ in rate distortion performance as compared with SQ alone and DPCM-plus-SQ.
CVAug 18, 2012
Image Super-Resolution via Dual-Dictionary Learning And Sparse RepresentationJian Zhang, Chen Zhao, Ruiqin Xiong et al.
Learning-based image super-resolution aims to reconstruct high-frequency (HF) details from the prior model trained by a set of high- and low-resolution image patches. In this paper, HF to be estimated is considered as a combination of two components: main high-frequency (MHF) and residual high-frequency (RHF), and we propose a novel image super-resolution method via dual-dictionary learning and sparse representation, which consists of the main dictionary learning and the residual dictionary learning, to recover MHF and RHF respectively. Extensive experimental results on test images validate that by employing the proposed two-layer progressive scheme, more image details can be recovered and much better results can be achieved than the state-of-the-art algorithms in terms of both PSNR and visual perception.
MMAug 18, 2012
Exploiting Image Local And Nonlocal Consistency For Mixed Gaussian-Impulse Noise RemovalJian Zhang, Ruiqin Xiong, Chen Zhao et al.
Most existing image denoising algorithms can only deal with a single type of noise, which violates the fact that the noisy observed images in practice are often suffered from more than one type of noise during the process of acquisition and transmission. In this paper, we propose a new variational algorithm for mixed Gaussian-impulse noise removal by exploiting image local consistency and nonlocal consistency simultaneously. Specifically, the local consistency is measured by a hyper-Laplace prior, enforcing the local smoothness of images, while the nonlocal consistency is measured by three-dimensional sparsity of similar blocks, enforcing the nonlocal self-similarity of natural images. Moreover, a Split-Bregman based technique is developed to solve the above optimization problem efficiently. Extensive experiments for mixed Gaussian plus impulse noise show that significant performance improvements over the current state-of-the-art schemes have been achieved, which substantiates the effectiveness of the proposed algorithm.
CVAug 18, 2012
Improved Total Variation based Image Compressive Sensing Recovery by Nonlocal RegularizationJian Zhang, Shaohui Liu, Debin Zhao et al.
Recently, total variation (TV) based minimization algorithms have achieved great success in compressive sensing (CS) recovery for natural images due to its virtue of preserving edges. However, the use of TV is not able to recover the fine details and textures, and often suffers from undesirable staircase artifact. To reduce these effects, this letter presents an improved TV based image CS recovery algorithm by introducing a new nonlocal regularization constraint into CS optimization problem. The nonlocal regularization is built on the well known nonlocal means (NLM) filtering and takes advantage of self-similarity in images, which helps to suppress the staircase effect and restore the fine details. Furthermore, an efficient augmented Lagrangian based algorithm is developed to solve the above combined TV and nonlocal regularization constrained problem. Experimental results demonstrate that the proposed algorithm achieves significant performance improvements over the state-of-the-art TV based algorithm in both PSNR and visual perception.