LGOct 13, 2022Code
Variational Graph Generator for Multi-View Graph ClusteringJianpeng Chen, Yawen Ling, Jie Xu et al.
Multi-view graph clustering (MGC) methods are increasingly being studied due to the explosion of multi-view data with graph structural information. The critical point of MGC is to better utilize view-specific and view-common information in features and graphs of multiple views. However, existing works have an inherent limitation that they are unable to concurrently utilize the consensus graph information across multiple graphs and the view-specific feature information. To address this issue, we propose Variational Graph Generator for Multi-View Graph Clustering (VGMGC). Specifically, a novel variational graph generator is proposed to extract common information among multiple graphs. This generator infers a reliable variational consensus graph based on a priori assumption over multiple graphs. Then a simple yet effective graph encoder in conjunction with the multi-view clustering objective is presented to learn the desired graph embeddings for clustering, which embeds the inferred view-common graph and view-specific graphs together with features. Finally, theoretical results illustrate the rationality of the VGMGC by analyzing the uncertainty of the inferred consensus graph with the information bottleneck principle.Extensive experiments demonstrate the superior performance of our VGMGC over SOTAs. The source code is publicly available at https://github.com/cjpcool/VGMGC.
LGOct 9, 2022
Deep Clustering: A Comprehensive SurveyYazhou Ren, Jingyu Pu, Zhimeng Yang et al.
Cluster analysis plays an indispensable role in machine learning and data mining. Learning a good data representation is crucial for clustering algorithms. Recently, deep clustering, which can learn clustering-friendly representations using deep neural networks, has been broadly applied in a wide range of clustering tasks. Existing surveys for deep clustering mainly focus on the single-view fields and the network architectures, ignoring the complex application scenarios of clustering. To address this issue, in this paper we provide a comprehensive survey for deep clustering in views of data sources. With different data sources and initial conditions, we systematically distinguish the clustering methods in terms of methodology, prior knowledge, and architecture. Concretely, deep clustering methods are introduced according to four categories, i.e., traditional single-view deep clustering, semi-supervised deep clustering, deep multi-view clustering, and deep transfer clustering. Finally, we discuss the open challenges and potential future opportunities in different fields of deep clustering.
LGMay 5, 2022
Contrastive Multi-view Hyperbolic Hierarchical ClusteringFangfei Lin, Bing Bai, Kun Bai et al.
Hierarchical clustering recursively partitions data at an increasingly finer granularity. In real-world applications, multi-view data have become increasingly important. This raises a less investigated problem, i.e., multi-view hierarchical clustering, to better understand the hierarchical structure of multi-view data. To this end, we propose a novel neural network-based model, namely Contrastive Multi-view Hyperbolic Hierarchical Clustering (CMHHC). It consists of three components, i.e., multi-view alignment learning, aligned feature similarity learning, and continuous hyperbolic hierarchical clustering. First, we align sample-level representations across multiple views in a contrastive way to capture the view-invariance information. Next, we utilize both the manifold and Euclidean similarities to improve the metric property. Then, we embed the representations into a hyperbolic space and optimize the hyperbolic embeddings via a continuous relaxation of hierarchical clustering loss. Finally, a binary clustering tree is decoded from optimized hyperbolic embeddings. Experimental results on five real-world datasets demonstrate the effectiveness of the proposed method and its components.
LGSep 25, 2023
A Novel Approach for Effective Multi-View Clustering with Information-Theoretic PerspectiveChenhang Cui, Yazhou Ren, Jingyu Pu et al.
Multi-view clustering (MVC) is a popular technique for improving clustering performance using various data sources. However, existing methods primarily focus on acquiring consistent information while often neglecting the issue of redundancy across multiple views. This study presents a new approach called Sufficient Multi-View Clustering (SUMVC) that examines the multi-view clustering framework from an information-theoretic standpoint. Our proposed method consists of two parts. Firstly, we develop a simple and reliable multi-view clustering method SCMVC (simple consistent multi-view clustering) that employs variational analysis to generate consistent information. Secondly, we propose a sufficient representation lower bound to enhance consistent information and minimise unnecessary information among views. The proposed SUMVC method offers a promising solution to the problem of multi-view clustering and provides a new perspective for analyzing multi-view data. To verify the effectiveness of our model, we conducted a theoretical analysis based on the Bayes Error Rate, and experiments on multiple multi-view datasets demonstrate the superior performance of SUMVC.
LGSep 24, 2023
Federated Deep Multi-View Clustering with Global Self-SupervisionXinyue Chen, Jie Xu, Yazhou Ren et al.
Federated multi-view clustering has the potential to learn a global clustering model from data distributed across multiple devices. In this setting, label information is unknown and data privacy must be preserved, leading to two major challenges. First, views on different clients often have feature heterogeneity, and mining their complementary cluster information is not trivial. Second, the storage and usage of data from multiple clients in a distributed environment can lead to incompleteness of multi-view data. To address these challenges, we propose a novel federated deep multi-view clustering method that can mine complementary cluster structures from multiple clients, while dealing with data incompleteness and privacy concerns. Specifically, in the server environment, we propose sample alignment and data extension techniques to explore the complementary cluster structures of multiple views. The server then distributes global prototypes and global pseudo-labels to each client as global self-supervised information. In the client environment, multiple clients use the global self-supervised information and deep autoencoders to learn view-specific cluster assignments and embedded features, which are then uploaded to the server for refining the global self-supervised information. Finally, the results of our extensive experiments demonstrate that our proposed method exhibits superior performance in addressing the challenges of incomplete multi-view data in distributed environments.
LGMar 30, 2023
Investigating and Mitigating the Side Effects of Noisy Views for Self-Supervised Clustering Algorithms in Practical Multi-View ScenariosJie Xu, Yazhou Ren, Xiaolong Wang et al.
Multi-view clustering (MVC) aims at exploring category structures among multi-view data in self-supervised manners. Multiple views provide more information than single views and thus existing MVC methods can achieve satisfactory performance. However, their performance might seriously degenerate when the views are noisy in practical multi-view scenarios. In this paper, we formally investigate the drawback of noisy views and then propose a theoretically grounded deep MVC method (namely MVCAN) to address this issue. Specifically, we propose a novel MVC objective that enables un-shared parameters and inconsistent clustering predictions across multiple views to reduce the side effects of noisy views. Furthermore, a two-level multi-view iterative optimization is designed to generate robust learning targets for refining individual views' representation learning. Theoretical analysis reveals that MVCAN works by achieving the multi-view consistency, complementarity, and noise robustness. Finally, experiments on extensive public datasets demonstrate that MVCAN outperforms state-of-the-art methods and is robust against the existence of noisy views.
CVOct 11, 2022
ViLPAct: A Benchmark for Compositional Generalization on Multimodal Human ActivitiesTerry Yue Zhuo, Yaqing Liao, Yuecheng Lei et al.
We introduce ViLPAct, a novel vision-language benchmark for human activity planning. It is designed for a task where embodied AI agents can reason and forecast future actions of humans based on video clips about their initial activities and intents in text. The dataset consists of 2.9k videos from \charades extended with intents via crowdsourcing, a multi-choice question test set, and four strong baselines. One of the baselines implements a neurosymbolic approach based on a multi-modal knowledge base (MKB), while the other ones are deep generative models adapted from recent state-of-the-art (SOTA) methods. According to our extensive experiments, the key challenges are compositional generalization and effective use of information from both modalities.
LGMay 8, 2022
Deep Embedded Multi-View Clustering via Jointly Learning Latent Representations and GraphsZongmo Huang, Yazhou Ren, Xiaorong Pu et al.
With the representation learning capability of the deep learning models, deep embedded multi-view clustering (MVC) achieves impressive performance in many scenarios and has become increasingly popular in recent years. Although great progress has been made in this field, most existing methods merely focus on learning the latent representations and ignore that learning the latent graph of nodes also provides available information for the clustering task. To address this issue, in this paper we propose Deep Embedded Multi-view Clustering via Jointly Learning Latent Representations and Graphs (DMVCJ), which utilizes the latent graphs to promote the performance of deep embedded MVC models from two aspects. Firstly, by learning the latent graphs and feature representations jointly, the graph convolution network (GCN) technique becomes available for our model. With the capability of GCN in exploiting the information from both graphs and features, the clustering performance of our model is significantly promoted. Secondly, based on the adjacency relations of nodes shown in the latent graphs, we design a sample-weighting strategy to alleviate the noisy issue, and further improve the effectiveness and robustness of the model. Experimental results on different types of real-world multi-view datasets demonstrate the effectiveness of DMVCJ.
CVMar 21, 2023
Self-Paced Neutral Expression-Disentangled Learning for Facial Expression RecognitionZhenqian Wu, Xiaoyuan Li, Yazhou Ren et al.
The accuracy of facial expression recognition is typically affected by the following factors: high similarities across different expressions, disturbing factors, and micro-facial movement of rapid and subtle changes. One potentially viable solution for addressing these barriers is to exploit the neutral information concealed in neutral expression images. To this end, in this paper we propose a self-Paced Neutral Expression-Disentangled Learning (SPNDL) model. SPNDL disentangles neutral information from facial expressions, making it easier to extract key and deviation features. Specifically, it allows to capture discriminative information among similar expressions and perceive micro-facial movements. In order to better learn these neutral expression-disentangled features (NDFs) and to alleviate the non-convex optimization problem, a self-paced learning (SPL) strategy based on NDFs is proposed in the training stage. SPL learns samples from easy to complex by increasing the number of samples selected into the training process, which enables to effectively suppress the negative impacts introduced by low-quality samples and inconsistently distributed NDFs. Experiments on three popular databases (i.e., CK+, Oulu-CASIA, and RAF-DB) show the effectiveness of our proposed method.
LGOct 12, 2024Code
Bridging Gaps: Federated Multi-View Clustering in Heterogeneous Hybrid ViewsXinyue Chen, Yazhou Ren, Jie Xu et al.
Recently, federated multi-view clustering (FedMVC) has emerged to explore cluster structures in multi-view data distributed on multiple clients. Existing approaches often assume that clients are isomorphic and all of them belong to either single-view clients or multi-view clients. Despite their success, these methods also present limitations when dealing with practical FedMVC scenarios involving heterogeneous hybrid views, where a mixture of both single-view and multi-view clients exhibit varying degrees of heterogeneity. In this paper, we propose a novel FedMVC framework, which concurrently addresses two challenges associated with heterogeneous hybrid views, i.e., client gap and view gap. To address the client gap, we design a local-synergistic contrastive learning approach that helps single-view clients and multi-view clients achieve consistency for mitigating heterogeneity among all clients. To address the view gap, we develop a global-specific weighting aggregation method, which encourages global models to learn complementary features from hybrid views. The interplay between local-synergistic contrastive learning and global-specific weighting aggregation mutually enhances the exploration of the data cluster structures distributed on multiple clients. Theoretical analysis and extensive experiments demonstrate that our method can handle the heterogeneous hybrid views in FedMVC and outperforms state-of-the-art methods. The code is available at \url{https://github.com/5Martina5/FMCSC}.
LGMar 14, 2024Code
S^2MVTC: a Simple yet Efficient Scalable Multi-View Tensor ClusteringZhen Long, Qiyuan Wang, Yazhou Ren et al.
Anchor-based large-scale multi-view clustering has attracted considerable attention for its effectiveness in handling massive datasets. However, current methods mainly seek the consensus embedding feature for clustering by exploring global correlations between anchor graphs or projection matrices.In this paper, we propose a simple yet efficient scalable multi-view tensor clustering (S^2MVTC) approach, where our focus is on learning correlations of embedding features within and across views. Specifically, we first construct the embedding feature tensor by stacking the embedding features of different views into a tensor and rotating it. Additionally, we build a novel tensor low-frequency approximation (TLFA) operator, which incorporates graph similarity into embedding feature learning, efficiently achieving smooth representation of embedding features within different views. Furthermore, consensus constraints are applied to embedding features to ensure inter-view semantic consistency. Experimental results on six large-scale multi-view datasets demonstrate that S^2MVTC significantly outperforms state-of-the-art algorithms in terms of clustering performance and CPU execution time, especially when handling massive data. The code of S^2MVTC is publicly available at https://github.com/longzhen520/S2MVTC.
CVMay 16, 2023Code
Multi-view MERA Subspace ClusteringZhen Long, Ce Zhu, Jie Chen et al.
Tensor-based multi-view subspace clustering (MSC) can capture high-order correlation in the self-representation tensor. Current tensor decompositions for MSC suffer from highly unbalanced unfolding matrices or rotation sensitivity, failing to fully explore inter/intra-view information. Using the advanced tensor network, namely, multi-scale entanglement renormalization ansatz (MERA), we propose a low-rank MERA based MSC (MERA-MSC) algorithm, where MERA factorizes a tensor into contractions of one top core factor and the rest orthogonal/semi-orthogonal factors. Benefiting from multiple interactions among orthogonal/semi-orthogonal (low-rank) factors, the low-rank MERA has a strong representation power to capture the complex inter/intra-view information in the self-representation tensor. The alternating direction method of multipliers is adopted to solve the optimization model. Experimental results on five multi-view datasets demonstrate MERA-MSC has superiority against the compared algorithms on six evaluation metrics. Furthermore, we extend MERA-MSC by incorporating anchor learning to develop a scalable low-rank MERA based multi-view clustering method (sMREA-MVC). The effectiveness and efficiency of sMERA-MVC have been validated on three large-scale multi-view datasets. To our knowledge, this is the first work to introduce MERA to the multi-view clustering topic. The codes of MERA-MSC and sMERA-MVC are publicly available at https://github.com/longzhen520/MERA-MSC.
CVDec 13, 2021Code
Self-Paced Deep Regression Forests with Consideration of Ranking FairnessLili Pan, Mingming Meng, Yazhou Ren et al.
Deep discriminative models (DDMs), e.g. deep regression forests and deep decision forests, have been extensively studied recently to solve problems such as facial age estimation, head pose estimation, etc.. Due to a shortage of well-labeled data that does not have noise and imbalanced distribution problems, learning DDMs is always challenging. Existing methods usually tackle these challenges through learning more discriminative features or re-weighting samples. We argue that learning DDMs gradually, from easy to hard, is more reasonable, for two reasons. First, this is more consistent with the cognitive process of human beings. Second, noisy as well as underrepresented examples can be distinguished by virtue of previously learned knowledge. Thus, we resort to a gradual learning strategy -- self-paced learning (SPL). Then, a natural question arises: can SPL lead DDMs to achieve more robust and less biased solutions? To answer this question, this paper proposes a new SPL method: easy and underrepresented examples first, for learning DDMs. This tackles the fundamental ranking and selection problem in SPL from a new perspective: fairness. Our idea is fundamental and can be easily combined with a variety of DDMs. Extensive experimental results on three computer vision tasks, i.e., facial age estimation, head pose estimation, and gaze estimation, show our new method gains considerable performance improvement in both accuracy and fairness. Source code is available at https://github.com/learninginvision/SPU.
LGJun 21, 2021Code
Customizing Graph Neural Networks using Path ReweightingJianpeng Chen, Yujing Wang, Ming Zeng et al.
Graph Neural Networks (GNNs) have been extensively used for mining graph-structured data with impressive performance. However, because these traditional GNNs do not distinguish among various downstream tasks, embeddings embedded by them are not always effective. Intuitively, paths in a graph imply different semantics for different downstream tasks. Inspired by this, we design a novel GNN solution, namely Customized Graph Neural Network with Path Reweighting (CustomGNN for short). Specifically, the proposed CustomGNN can automatically learn the high-level semantics for specific downstream tasks to highlight semantically relevant paths as well to filter out task-irrelevant noises in a graph. Furthermore, we empirically analyze the semantics learned by CustomGNN and demonstrate its ability to avoid the three inherent problems in traditional GNNs, i.e., over-smoothing, poor robustness, and overfitting. In experiments with the node classification task, CustomGNN achieves state-of-the-art accuracies on three standard graph datasets and four large graph datasets. The source code of the proposed CustomGNN is available at \url{https://github.com/cjpcool/CustomGNN}.
IVMay 15, 2021Code
GCN-MIF: Graph Convolutional Network with Multi-Information Fusion for Low-dose CT DenoisingKecheng Chen, Jiayu Sun, Jiang Shen et al.
Being low-level radiation exposure and less harmful to health, low-dose computed tomography (LDCT) has been widely adopted in the early screening of lung cancer and COVID-19. LDCT images inevitably suffer from the degradation problem caused by complex noises. It was reported that deep learning (DL)-based LDCT denoising methods using convolutional neural network (CNN) achieved impressive denoising performance. Although most existing DL-based methods (e.g., encoder-decoder framework) can implicitly utilize non-local and contextual information via downsampling operator and 3D CNN, the explicit multi-information (i.e., local, non-local, and contextual) integration may not be explored enough. To address this issue, we propose a novel graph convolutional network-based LDCT denoising model, namely GCN-MIF, to explicitly perform multi-information fusion for denoising purpose. Concretely, by constructing intra- and inter-slice graph, the graph convolutional network is introduced to leverage the non-local and contextual relationships among pixels. The traditional CNN is adopted for the extraction of local information. Finally, the proposed GCN-MIF model fuses all the extracted local, non-local, and contextual information. Extensive experiments show the effectiveness of our proposed GCN-MIF model by quantitative and visualized results. Furthermore, a double-blind reader study on a public clinical dataset is also performed to validate the usability of denoising results in terms of the structural fidelity, the noise suppression, and the overall score. Models and code are available at https://github.com/tonyckc/GCN-MIF_demo.
LGMar 28, 2021Code
Self-Supervised Discriminative Feature Learning for Deep Multi-View ClusteringJie Xu, Yazhou Ren, Huayi Tang et al.
Multi-view clustering is an important research topic due to its capability to utilize complementary information from multiple views. However, there are few methods to consider the negative impact caused by certain views with unclear clustering structures, resulting in poor multi-view clustering performance. To address this drawback, we propose self-supervised discriminative feature learning for deep multi-view clustering (SDMVC). Concretely, deep autoencoders are applied to learn embedded features for each view independently. To leverage the multi-view complementary information, we concatenate all views' embedded features to form the global features, which can overcome the negative impact of some views' unclear clustering structures. In a self-supervised manner, pseudo-labels are obtained to build a unified target distribution to perform multi-view discriminative feature learning. During this process, global discriminative information can be mined to supervise all views to learn more discriminative features, which in turn are used to update the target distribution. Besides, this unified target distribution can make SDMVC learn consistent cluster assignments, which accomplishes the clustering consistency of multiple views while preserving their features' diversity. Experiments on various types of multi-view datasets show that SDMVC outperforms 14 competitors including classic and state-of-the-art methods. The code is available at https://github.com/SubmissionsIn/SDMVC.
LGJan 5, 2024
Homophily-Related: Adaptive Hybrid Graph Filter for Multi-View Graph ClusteringZichen Wen, Yawen Ling, Yazhou Ren et al.
Recently there is a growing focus on graph data, and multi-view graph clustering has become a popular area of research interest. Most of the existing methods are only applicable to homophilous graphs, yet the extensive real-world graph data can hardly fulfill the homophily assumption, where the connected nodes tend to belong to the same class. Several studies have pointed out that the poor performance on heterophilous graphs is actually due to the fact that conventional graph neural networks (GNNs), which are essentially low-pass filters, discard information other than the low-frequency information on the graph. Nevertheless, on certain graphs, particularly heterophilous ones, neglecting high-frequency information and focusing solely on low-frequency information impedes the learning of node representations. To break this limitation, our motivation is to perform graph filtering that is closely related to the homophily degree of the given graph, with the aim of fully leveraging both low-frequency and high-frequency signals to learn distinguishable node embedding. In this work, we propose Adaptive Hybrid Graph Filter for Multi-View Graph Clustering (AHGFC). Specifically, a graph joint process and graph joint aggregation matrix are first designed by using the intrinsic node features and adjacency relationship, which makes the low and high-frequency signals on the graph more distinguishable. Then we design an adaptive hybrid graph filter that is related to the homophily degree, which learns the node embedding based on the graph joint aggregation matrix. After that, the node embedding of each view is weighted and fused into a consensus embedding for the downstream task. Experimental results show that our proposed model performs well on six datasets containing homophilous and heterophilous graphs.
LGOct 30, 2024
Dual-Optimized Adaptive Graph Reconstruction for Multi-View Graph ClusteringZichen Wen, Tianyi Wu, Yazhou Ren et al.
Multi-view clustering is an important machine learning task for multi-media data, encompassing various domains such as images, videos, and texts. Moreover, with the growing abundance of graph data, the significance of multi-view graph clustering (MVGC) has become evident. Most existing methods focus on graph neural networks (GNNs) to extract information from both graph structure and feature data to learn distinguishable node representations. However, traditional GNNs are designed with the assumption of homophilous graphs, making them unsuitable for widely prevalent heterophilous graphs. Several techniques have been introduced to enhance GNNs for heterophilous graphs. While these methods partially mitigate the heterophilous graph issue, they often neglect the advantages of traditional GNNs, such as their simplicity, interpretability, and efficiency. In this paper, we propose a novel multi-view graph clustering method based on dual-optimized adaptive graph reconstruction, named DOAGC. It mainly aims to reconstruct the graph structure adapted to traditional GNNs to deal with heterophilous graph issues while maintaining the advantages of traditional GNNs. Specifically, we first develop an adaptive graph reconstruction mechanism that accounts for node correlation and original structural information. To further optimize the reconstruction graph, we design a dual optimization strategy and demonstrate the feasibility of our optimization strategy through mutual information theory. Numerous experiments demonstrate that DOAGC effectively mitigates the heterophilous graph problem.
LGJul 24, 2025
ChronoSelect: Robust Learning with Noisy Labels via Dynamics Temporal MemoryJianchao Wang, Qingfeng Li, Pengcheng Zheng et al.
Training deep neural networks on real-world datasets is often hampered by the presence of noisy labels, which can be memorized by over-parameterized models, leading to significant degradation in generalization performance. While existing methods for learning with noisy labels (LNL) have made considerable progress, they fundamentally suffer from static snapshot evaluations and fail to leverage the rich temporal dynamics of learning evolution. In this paper, we propose ChronoSelect (chrono denoting its temporal nature), a novel framework featuring an innovative four-stage memory architecture that compresses prediction history into compact temporal distributions. Our unique sliding update mechanism with controlled decay maintains only four dynamic memory units per sample, progressively emphasizing recent patterns while retaining essential historical knowledge. This enables precise three-way sample partitioning into clean, boundary, and noisy subsets through temporal trajectory analysis and dual-branch consistency. Theoretical guarantees prove the mechanism's convergence and stability under noisy conditions. Extensive experiments demonstrate ChronoSelect's state-of-the-art performance across synthetic and real-world benchmarks.
IVApr 15, 2025
Lightweight Medical Image Restoration via Integrating Reliable Lesion-Semantic Driven PriorPengcheng Zheng, Kecheng Chen, Jiaxin Huang et al.
Medical image restoration tasks aim to recover high-quality images from degraded observations, exhibiting emergent desires in many clinical scenarios, such as low-dose CT image denoising, MRI super-resolution, and MRI artifact removal. Despite the success achieved by existing deep learning-based restoration methods with sophisticated modules, they struggle with rendering computationally-efficient reconstruction results. Moreover, they usually ignore the reliability of the restoration results, which is much more urgent in medical systems. To alleviate these issues, we present LRformer, a Lightweight Transformer-based method via Reliability-guided learning in the frequency domain. Specifically, inspired by the uncertainty quantification in Bayesian neural networks (BNNs), we develop a Reliable Lesion-Semantic Prior Producer (RLPP). RLPP leverages Monte Carlo (MC) estimators with stochastic sampling operations to generate sufficiently-reliable priors by performing multiple inferences on the foundational medical image segmentation model, MedSAM. Additionally, instead of directly incorporating the priors in the spatial domain, we decompose the cross-attention (CA) mechanism into real symmetric and imaginary anti-symmetric parts via fast Fourier transform (FFT), resulting in the design of the Guided Frequency Cross-Attention (GFCA) solver. By leveraging the conjugated symmetric property of FFT, GFCA reduces the computational complexity of naive CA by nearly half. Extensive experimental results in various tasks demonstrate the superiority of the proposed LRformer in both effectiveness and efficiency.
IVAug 3, 2025
Joint Lossless Compression and Steganography for Medical Images via Large Language ModelsPengcheng Zheng, Xiaorong Pu, Kecheng Chen et al.
Recently, large language models (LLMs) have driven promising progress in lossless image compression. However, directly adopting existing paradigms for medical images suffers from an unsatisfactory trade-off between compression performance and efficiency. Moreover, existing LLM-based compressors often overlook the security of the compression process, which is critical in modern medical scenarios. To this end, we propose a novel joint lossless compression and steganography framework. Inspired by bit plane slicing (BPS), we find it feasible to securely embed privacy messages into medical images in an invisible manner. Based on this insight, an adaptive modalities decomposition strategy is first devised to partition the entire image into two segments, providing global and local modalities for subsequent dual-path lossless compression. During this dual-path stage, we innovatively propose a segmented message steganography algorithm within the local modality path to ensure the security of the compression process. Coupled with the proposed anatomical priors-based low-rank adaptation (A-LoRA) fine-tuning strategy, extensive experimental results demonstrate the superiority of our proposed method in terms of compression ratios, efficiency, and security. The source code will be made publicly available.
LGMay 11, 2023
Deep Multi-View Subspace Clustering with Anchor GraphChenhang Cui, Yazhou Ren, Jingyu Pu et al.
Deep multi-view subspace clustering (DMVSC) has recently attracted increasing attention due to its promising performance. However, existing DMVSC methods still have two issues: (1) they mainly focus on using autoencoders to nonlinearly embed the data, while the embedding may be suboptimal for clustering because the clustering objective is rarely considered in autoencoders, and (2) existing methods typically have a quadratic or even cubic complexity, which makes it challenging to deal with large-scale data. To address these issues, in this paper we propose a novel deep multi-view subspace clustering method with anchor graph (DMCAG). To be specific, DMCAG firstly learns the embedded features for each view independently, which are used to obtain the subspace representations. To significantly reduce the complexity, we construct an anchor graph with small size for each view. Then, spectral clustering is performed on an integrated anchor graph to obtain pseudo-labels. To overcome the negative impact caused by suboptimal embedded features, we use pseudo-labels to refine the embedding process to make it more suitable for the clustering task. Pseudo-labels and embedded features are updated alternately. Furthermore, we design a strategy to keep the consistency of the labels based on contrastive learning to enhance the clustering performance. Empirical studies on real-world datasets show that our method achieves superior clustering performance over other state-of-the-art methods.
SPSep 9, 2021
EEGDnet: Fusing Non-Local and Local Self-Similarity for 1-D EEG Signal Denoising with 2-D TransformerPeng Yi, Kecheng Chen, Zhaoqi Ma et al.
Electroencephalogram (EEG) has shown a useful approach to produce a brain-computer interface (BCI). One-dimensional (1-D) EEG signal is yet easily disturbed by certain artifacts (a.k.a. noise) due to the high temporal resolution. Thus, it is crucial to remove the noise in received EEG signal. Recently, deep learning-based EEG signal denoising approaches have achieved impressive performance compared with traditional ones. It is well known that the characteristics of self-similarity (including non-local and local ones) of data (e.g., natural images and time-domain signals) are widely leveraged for denoising. However, existing deep learning-based EEG signal denoising methods ignore either the non-local self-similarity (e.g., 1-D convolutional neural network) or local one (e.g., fully connected network and recurrent neural network). To address this issue, we propose a novel 1-D EEG signal denoising network with 2-D transformer, namely EEGDnet. Specifically, we comprehensively take into account the non-local and local self-similarity of EEG signal through the transformer module. By fusing non-local self-similarity in self-attention blocks and local self-similarity in feed forward blocks, the negative impact caused by noises and outliers can be reduced significantly. Extensive experiments show that, compared with other state-of-the-art models, EEGDnet achieves much better performance in terms of both quantitative and qualitative metrics.
CVJun 21, 2021
Multi-VAE: Learning Disentangled View-common and View-peculiar Visual Representations for Multi-view ClusteringJie Xu, Yazhou Ren, Huayi Tang et al.
Multi-view clustering, a long-standing and important research problem, focuses on mining complementary information from diverse views. However, existing works often fuse multiple views' representations or handle clustering in a common feature space, which may result in their entanglement especially for visual representations. To address this issue, we present a novel VAE-based multi-view clustering framework (Multi-VAE) by learning disentangled visual representations. Concretely, we define a view-common variable and multiple view-peculiar variables in the generative model. The prior of view-common variable obeys approximately discrete Gumbel Softmax distribution, which is introduced to extract the common cluster factor of multiple views. Meanwhile, the prior of view-peculiar variable follows continuous Gaussian distribution, which is used to represent each view's peculiar visual factors. By controlling the mutual information capacity to disentangle the view-common and view-peculiar representations, continuous visual information of multiple views can be separated so that their common discrete cluster information can be effectively mined. Experimental results demonstrate that Multi-VAE enjoys the disentangled and explainable visual representations, while obtaining superior clustering performance compared with state-of-the-art methods.
LGJun 21, 2021
Multi-level Feature Learning for Contrastive Multi-view ClusteringJie Xu, Huayi Tang, Yazhou Ren et al.
Multi-view clustering can explore common semantics from multiple views and has attracted increasing attention. However, existing works punish multiple objectives in the same feature space, where they ignore the conflict between learning consistent common semantics and reconstructing inconsistent view-private information. In this paper, we propose a new framework of multi-level feature learning for contrastive multi-view clustering to address the aforementioned issue. Our method learns different levels of features from the raw features, including low-level features, high-level features, and semantic labels/features in a fusion-free manner, so that it can effectively achieve the reconstruction objective and the consistency objectives in different feature spaces. Specifically, the reconstruction objective is conducted on the low-level features. Two consistency objectives based on contrastive learning are conducted on the high-level features and the semantic labels, respectively. They make the high-level features effectively explore the common semantics and the semantic labels achieve the multi-view clustering. As a result, the proposed framework can reduce the adverse influence of view-private information. Extensive experiments on public datasets demonstrate that our method achieves state-of-the-art clustering effectiveness.
LGApr 19, 2021
Non-Linear Fusion for Self-Paced Multi-View ClusteringZongmo Huang, Yazhou Ren, Xiaorong Pu et al.
With the advance of the multi-media and multi-modal data, multi-view clustering (MVC) has drawn increasing attentions recently. In this field, one of the most crucial challenges is that the characteristics and qualities of different views usually vary extensively. Therefore, it is essential for MVC methods to find an effective approach that handles the diversity of multiple views appropriately. To this end, a series of MVC methods focusing on how to integrate the loss from each view have been proposed in the past few years. Among these methods, the mainstream idea is assigning weights to each view and then combining them linearly. In this paper, inspired by the effectiveness of non-linear combination in instance learning and the auto-weighted approaches, we propose Non-Linear Fusion for Self-Paced Multi-View Clustering (NSMVC), which is totally different from the the conventional linear-weighting algorithms. In NSMVC, we directly assign different exponents to different views according to their qualities. By this way, the negative impact from the corrupt views can be significantly reduced. Meanwhile, to address the non-convex issue of the MVC model, we further define a novel regularizer-free modality of Self-Paced Learning (SPL), which fits the proposed non-linear model perfectly. Experimental results on various real-world data sets demonstrate the effectiveness of the proposed method.
CVApr 18, 2021
Lesion-Inspired Denoising Network: Connecting Medical Image Denoising and Lesion DetectionKecheng Chen, Kun Long, Yazhou Ren et al.
Deep learning has achieved notable performance in the denoising task of low-quality medical images and the detection task of lesions, respectively. However, existing low-quality medical image denoising approaches are disconnected from the detection task of lesions. Intuitively, the quality of denoised images will influence the lesion detection accuracy that in turn can be used to affect the denoising performance. To this end, we propose a play-and-plug medical image denoising framework, namely Lesion-Inspired Denoising Network (LIDnet), to collaboratively improve both denoising performance and detection accuracy of denoised medical images. Specifically, we propose to insert the feedback of downstream detection task into existing denoising framework by jointly learning a multi-loss objective. Instead of using perceptual loss calculated on the entire feature map, a novel region-of-interest (ROI) perceptual loss induced by the lesion detection task is proposed to further connect these two tasks. To achieve better optimization for overall framework, we propose a customized collaborative training strategy for LIDnet. On consideration of clinical usability and imaging characteristics, three low-dose CT images datasets are used to evaluate the effectiveness of the proposed LIDnet. Experiments show that, by equipping with LIDnet, both of the denoising and lesion detection performance of baseline methods can be significantly improved.
LGJul 26, 2020
Deep Embedded Multi-view Clustering with Collaborative TrainingJie Xu, Yazhou Ren, Guofeng Li et al.
Multi-view clustering has attracted increasing attentions recently by utilizing information from multiple views. However, existing multi-view clustering methods are either with high computation and space complexities, or lack of representation capability. To address these issues, we propose deep embedded multi-view clustering with collaborative training (DEMVC) in this paper. Firstly, the embedded representations of multiple views are learned individually by deep autoencoders. Then, both consensus and complementary of multiple views are taken into account and a novel collaborative training scheme is proposed. Concretely, the feature representations and cluster assignments of all views are learned collaboratively. A new consistency strategy for cluster centers initialization is further developed to improve the multi-view clustering performance with collaborative training. Experimental results on several popular multi-view datasets show that DEMVC achieves significant improvements over state-of-the-art methods.
CVApr 3, 2020
Self-Paced Deep Regression Forests with Consideration on Underrepresented ExamplesLili Pan, Shijie Ai, Yazhou Ren et al.
Deep discriminative models (e.g. deep regression forests, deep neural decision forests) have achieved remarkable success recently to solve problems such as facial age estimation and head pose estimation. Most existing methods pursue robust and unbiased solutions either through learning discriminative features, or reweighting samples. We argue what is more desirable is learning gradually to discriminate like our human beings, and hence we resort to self-paced learning (SPL). Then, a natural question arises: can self-paced regime lead deep discriminative models to achieve more robust and less biased solutions? To this end, this paper proposes a new deep discriminative model--self-paced deep regression forests with consideration on underrepresented examples (SPUDRFs). It tackles the fundamental ranking and selecting problem in SPL from a new perspective: fairness. This paradigm is fundamental and could be easily combined with a variety of deep discriminative models (DDMs). Extensive experiments on two computer vision tasks, i.e., facial age estimation and head pose estimation, demonstrate the efficacy of SPUDRFs, where state-of-the-art performances are achieved.
CVOct 8, 2019
Self-Paced Deep Regression Forests for Facial Age EstimationShijie Ai, Lili Pan, Yazhou Ren
Facial age estimation is an important and challenging problem in computer vision. Existing approaches usually employ deep neural networks (DNNs) to fit the mapping from facial features to age, even though there exist some noisy and confusing samples. We argue that it is more desirable to distinguish noisy and confusing facial images from regular ones, and alleviate the interference arising from them. To this end, we propose self-paced deep regression forests (SP-DRFs) -- a gradual learning DNNs framework for age estimation. As the model is learned gradually, from simplicity to complexity, it tends to emphasize more on reliable samples and avoid bad local minima. Moreover, the proposed capped-likelihood function helps to exclude noisy samples in training, rendering our SP-DRFs significantly more robust. We demonstrate the efficacy of SP-DRFs on Morph II and FG-NET datasets, where our model achieves state-of-the-art performance.
LGMay 11, 2019
Ranking-based Deep Cross-modal HashingXuanwu Liu, Guoxian Yu, Carlotta Domeniconi et al.
Cross-modal hashing has been receiving increasing interests for its low storage cost and fast query speed in multi-modal data retrievals. However, most existing hashing methods are based on hand-crafted or raw level features of objects, which may not be optimally compatible with the coding process. Besides, these hashing methods are mainly designed to handle simple pairwise similarity. The complex multilevel ranking semantic structure of instances associated with multiple labels has not been well explored yet. In this paper, we propose a ranking-based deep cross-modal hashing approach (RDCMH). RDCMH firstly uses the feature and label information of data to derive a semi-supervised semantic ranking list. Next, to expand the semantic representation power of hand-crafted features, RDCMH integrates the semantic ranking information into deep cross-modal hashing and jointly optimizes the compatible parameters of deep feature representations and of hashing functions. Experiments on real multi-modal datasets show that RDCMH outperforms other competitive baselines and achieves the state-of-the-art performance in cross-modal retrieval applications.
LGDec 17, 2018
Latent Dirichlet Allocation in Generative Adversarial NetworksLili Pan, Shen Cheng, Jian Liu et al.
We study the problem of multimodal generative modelling of images based on generative adversarial networks (GANs). Despite the success of existing methods, they often ignore the underlying structure of vision data or its multimodal generation characteristics. To address this problem, we introduce the Dirichlet prior for multimodal image generation, which leads to a new Latent Dirichlet Allocation based GAN (LDAGAN). In detail, for the generative process modelling, LDAGAN defines a generative mode for each sample, determining which generative sub-process it belongs to. For the adversarial training, LDAGAN derives a variational expectation-maximization (VEM) algorithm to estimate model parameters. Experimental results on real-world datasets have demonstrated the outstanding performance of LDAGAN over other existing GANs.
LGDec 11, 2018
Deep Density-based Image ClusteringYazhou Ren, Ni Wang, Mingxia Li et al.
Recently, deep clustering, which is able to perform feature learning that favors clustering tasks via deep neural networks, has achieved remarkable performance in image clustering applications. However, the existing deep clustering algorithms generally need the number of clusters in advance, which is usually unknown in real-world tasks. In addition, the initial cluster centers in the learned feature space are generated by $k$-means. This only works well on spherical clusters and probably leads to unstable clustering results. In this paper, we propose a two-stage deep density-based image clustering (DDC) framework to address these issues. The first stage is to train a deep convolutional autoencoder (CAE) to extract low-dimensional feature representations from high-dimensional image data, and then apply t-SNE to further reduce the data to a 2-dimensional space favoring density-based clustering algorithms. The second stage is to apply the developed density-based clustering technique on the 2-dimensional embedded data to automatically recognize an appropriate number of clusters with arbitrary shapes. Concretely, a number of local clusters are generated to capture the local structures of clusters, and then are merged via their density relationship to form the final clustering result. Experiments demonstrate that the proposed DDC achieves comparable or even better clustering performance than state-of-the-art deep clustering methods, even though the number of clusters is not given.
LGAug 24, 2018
Self-Paced Multi-Task ClusteringYazhou Ren, Xiaofan Que, Dezhong Yao et al.
Multi-task clustering (MTC) has attracted a lot of research attentions in machine learning due to its ability in utilizing the relationship among different tasks. Despite the success of traditional MTC models, they are either easy to stuck into local optima, or sensitive to outliers and noisy data. To alleviate these problems, we propose a novel self-paced multi-task clustering (SPMTC) paradigm. In detail, SPMTC progressively selects data examples to train a series of MTC models with increasing complexity, thus highly decreases the risk of trapping into poor local optima. Furthermore, to reduce the negative influence of outliers and noisy data, we design a soft version of SPMTC to further improve the clustering performance. The corresponding SPMTC framework can be easily solved by an alternating optimization method. The proposed model is guaranteed to converge and experiments on real data sets have demonstrated its promising results compared with state-of-the-art multi-task clustering methods.