Ding-Hua Chen

CV
3papers
101citations
Novelty50%
AI Score28

3 Papers

LGJun 1, 2022Code
Strongly Augmented Contrastive Clustering

Xiaozhi Deng, Dong Huang, Ding-Hua Chen et al.

Deep clustering has attracted increasing attention in recent years due to its capability of joint representation learning and clustering via deep neural networks. In its latest developments, the contrastive learning has emerged as an effective technique to substantially enhance the deep clustering performance. However, the existing contrastive learning based deep clustering algorithms mostly focus on some carefully-designed augmentations (often with limited transformations to preserve the structure), referred to as weak augmentations, but cannot go beyond the weak augmentations to explore the more opportunities in stronger augmentations (with more aggressive transformations or even severe distortions). In this paper, we present an end-to-end deep clustering approach termed Strongly Augmented Contrastive Clustering (SACC), which extends the conventional two-augmentation-view paradigm to multiple views and jointly leverages strong and weak augmentations for strengthened deep clustering. Particularly, we utilize a backbone network with triply-shared weights, where a strongly augmented view and two weakly augmented views are incorporated. Based on the representations produced by the backbone, the weak-weak view pair and the strong-weak view pairs are simultaneously exploited for the instance-level contrastive learning (via an instance projector) and the cluster-level contrastive learning (via a cluster projector), which, together with the backbone, can be jointly optimized in a purely unsupervised manner. Experimental results on five challenging image datasets have shown the superiority of our SACC approach over the state-of-the-art. The code is available at https://github.com/dengxiaozhi/SACC.

CVJun 26, 2022
Vision Transformer for Contrastive Clustering

Hua-Bao Ling, Bowen Zhu, Dong Huang et al.

Vision Transformer (ViT) has shown its advantages over the convolutional neural network (CNN) with its ability to capture global long-range dependencies for visual representation learning. Besides ViT, contrastive learning is another popular research topic recently. While previous contrastive learning works are mostly based on CNNs, some recent studies have attempted to combine ViT and contrastive learning for enhanced self-supervised learning. Despite the considerable progress, these combinations of ViT and contrastive learning mostly focus on the instance-level contrastiveness, which often overlook the global contrastiveness and also lack the ability to directly learn the clustering result (e.g., for images). In view of this, this paper presents a novel deep clustering approach termed Vision Transformer for Contrastive Clustering (VTCC), which for the first time, to our knowledge, unifies the Transformer and the contrastive learning for the image clustering task. Specifically, with two random augmentations performed on each image, we utilize a ViT encoder with two weight-sharing views as the backbone. To remedy the potential instability of the ViT, we incorporate a convolutional stem to split each augmented sample into a sequence of patches, which uses multiple stacked small convolutions instead of a big convolution in the patch projection layer. By learning the feature representations for the sequences of patches via the backbone, an instance projector and a cluster projector are further utilized to perform the instance-level contrastive learning and the global clustering structure learning, respectively. Experiments on eight image datasets demonstrate the stability (during the training-from-scratch) and the superiority (in clustering performance) of our VTCC approach over the state-of-the-art.

CVJun 1, 2022
DeepCluE: Enhanced Image Clustering via Multi-layer Ensembles in Deep Neural Networks

Dong Huang, Ding-Hua Chen, Xiangji Chen et al.

Deep clustering has recently emerged as a promising technique for complex data clustering. Despite the considerable progress, previous deep clustering works mostly build or learn the final clustering by only utilizing a single layer of representation, e.g., by performing the K-means clustering on the last fully-connected layer or by associating some clustering loss to a specific layer, which neglect the possibilities of jointly leveraging multi-layer representations for enhancing the deep clustering performance. In view of this, this paper presents a Deep Clustering via Ensembles (DeepCluE) approach, which bridges the gap between deep clustering and ensemble clustering by harnessing the power of multiple layers in deep neural networks. In particular, we utilize a weight-sharing convolutional neural network as the backbone, which is trained with both the instance-level contrastive learning (via an instance projector) and the cluster-level contrastive learning (via a cluster projector) in an unsupervised manner. Thereafter, multiple layers of feature representations are extracted from the trained network, upon which the ensemble clustering process is further conducted. Specifically, a set of diversified base clusterings are generated from the multi-layer representations via a highly efficient clusterer. Then the reliability of clusters in multiple base clusterings is automatically estimated by exploiting an entropy-based criterion, based on which the set of base clusterings are re-formulated into a weighted-cluster bipartite graph. By partitioning this bipartite graph via transfer cut, the final consensus clustering can be obtained. Experimental results on six image datasets confirm the advantages of DeepCluE over the state-of-the-art deep clustering approaches.