Mulin Chen

CV
h-index21
23papers
362citations
Novelty54%
AI Score44

23 Papers

CVApr 11, 2023
One-Shot High-Fidelity Talking-Head Synthesis with Deformable Neural Radiance Field

Weichuang Li, Longhao Zhang, Dong Wang et al.

Talking head generation aims to generate faces that maintain the identity information of the source image and imitate the motion of the driving image. Most pioneering methods rely primarily on 2D representations and thus will inevitably suffer from face distortion when large head rotations are encountered. Recent works instead employ explicit 3D structural representations or implicit neural rendering to improve performance under large pose changes. Nevertheless, the fidelity of identity and expression is not so desirable, especially for novel-view synthesis. In this paper, we propose HiDe-NeRF, which achieves high-fidelity and free-view talking-head synthesis. Drawing on the recently proposed Deformable Neural Radiance Fields, HiDe-NeRF represents the 3D dynamic scene into a canonical appearance field and an implicit deformation field, where the former comprises the canonical source face and the latter models the driving pose and expression. In particular, we improve fidelity from two aspects: (i) to enhance identity expressiveness, we design a generalized appearance module that leverages multi-scale volume features to preserve face shape and details; (ii) to improve expression preciseness, we propose a lightweight deformation module that explicitly decouples the pose and expression to enable precise expression modeling. Extensive experiments demonstrate that our proposed approach can generate better results than previous works. Project page: https://www.waytron.net/hidenerf/

IVMar 19, 2023
Fully Self-Supervised Depth Estimation from Defocus Clue

Haozhe Si, Bin Zhao, Dong Wang et al.

Depth-from-defocus (DFD), modeling the relationship between depth and defocus pattern in images, has demonstrated promising performance in depth estimation. Recently, several self-supervised works try to overcome the difficulties in acquiring accurate depth ground-truth. However, they depend on the all-in-focus (AIF) images, which cannot be captured in real-world scenarios. Such limitation discourages the applications of DFD methods. To tackle this issue, we propose a completely self-supervised framework that estimates depth purely from a sparse focal stack. We show that our framework circumvents the needs for the depth and AIF image ground-truth, and receives superior predictions, thus closing the gap between the theoretical success of DFD works and their applications in the real world. In particular, we propose (i) a more realistic setting for DFD tasks, where no depth or AIF image ground-truth is available; (ii) a novel self-supervision framework that provides reliable predictions of depth and AIF image under the challenging setting. The proposed framework uses a neural model to predict the depth and AIF image, and utilizes an optical model to validate and refine the prediction. We verify our framework on three benchmark datasets with rendered focal stacks and real focal stacks. Qualitative and quantitative evaluations show that our method provides a strong baseline for self-supervised DFD tasks.

CVMar 21, 2023
Propagate And Calibrate: Real-time Passive Non-line-of-sight Tracking

Yihao Wang, Zhigang Wang, Bin Zhao et al.

Non-line-of-sight (NLOS) tracking has drawn increasing attention in recent years, due to its ability to detect object motion out of sight. Most previous works on NLOS tracking rely on active illumination, e.g., laser, and suffer from high cost and elaborate experimental conditions. Besides, these techniques are still far from practical application due to oversimplified settings. In contrast, we propose a purely passive method to track a person walking in an invisible room by only observing a relay wall, which is more in line with real application scenarios, e.g., security. To excavate imperceptible changes in videos of the relay wall, we introduce difference frames as an essential carrier of temporal-local motion messages. In addition, we propose PAC-Net, which consists of alternating propagation and calibration, making it capable of leveraging both dynamic and static messages on a frame-level granularity. To evaluate the proposed method, we build and publish the first dynamic passive NLOS tracking dataset, NLOS-Track, which fills the vacuum of realistic NLOS datasets. NLOS-Track contains thousands of NLOS video clips and corresponding trajectories. Both real-shot and synthetic data are included. Our codes and dataset are available at https://againstentropy.github.io/NLOS-Track/.

LGAug 6, 2024
Doubly Stochastic Adaptive Neighbors Clustering via the Marcus Mapping

Jinghui Yuan, Chusheng Zeng, Fangyuan Xie et al.

Clustering is a fundamental task in machine learning and data science, and similarity graph-based clustering is an important approach within this domain. Doubly stochastic symmetric similarity graphs provide numerous benefits for clustering problems and downstream tasks, yet learning such graphs remains a significant challenge. Marcus theorem states that a strictly positive symmetric matrix can be transformed into a doubly stochastic symmetric matrix by diagonal matrices. However, in clustering, learning sparse matrices is crucial for computational efficiency. We extend Marcus theorem by proposing the Marcus mapping, which indicates that certain sparse matrices can also be transformed into doubly stochastic symmetric matrices via diagonal matrices. Additionally, we introduce rank constraints into the clustering problem and propose the Doubly Stochastic Adaptive Neighbors Clustering algorithm based on the Marcus Mapping (ANCMM). This ensures that the learned graph naturally divides into the desired number of clusters. We validate the effectiveness of our algorithm through extensive comparisons with state-of-the-art algorithms. Finally, we explore the relationship between the Marcus mapping and optimal transport. We prove that the Marcus mapping solves a specific type of optimal transport problem and demonstrate that solving this problem through Marcus mapping is more efficient than directly applying optimal transport methods.

CVAug 29, 2024
Beyond Uncertainty: Evidential Deep Learning for Robust Video Temporal Grounding

Kaijing Ma, Haojian Huang, Jin Chen et al.

Existing Video Temporal Grounding (VTG) models excel in accuracy but often overlook open-world challenges posed by open-vocabulary queries and untrimmed videos. This leads to unreliable predictions for noisy, corrupted, and out-of-distribution data. Adapting VTG models to dynamically estimate uncertainties based on user input can address this issue. To this end, we introduce SRAM, a robust network module that benefits from a two-stage cross-modal alignment task. More importantly, it integrates Deep Evidential Regression (DER) to explicitly and thoroughly quantify uncertainty during training, thus allowing the model to say "I do not know" in scenarios beyond its handling capacity. However, the direct application of traditional DER theory and its regularizer reveals structural flaws, leading to unintended constraints in VTG tasks. In response, we develop a simple yet effective Geom-regularizer that enhances the uncertainty learning framework from the ground up. To the best of our knowledge, this marks the first successful attempt of DER in VTG. Our extensive quantitative and qualitative results affirm the effectiveness, robustness, and interpretability of our modules and the uncertainty learning paradigm in VTG tasks. The code will be made available.

CVNov 30, 2025Code
Adaptive Evidential Learning for Temporal-Semantic Robustness in Moment Retrieval

Haojian Huang, Kaijing Ma, Jin Chen et al.

In the domain of moment retrieval, accurately identifying temporal segments within videos based on natural language queries remains challenging. Traditional methods often employ pre-trained models that struggle with fine-grained information and deterministic reasoning, leading to difficulties in aligning with complex or ambiguous moments. To overcome these limitations, we explore Deep Evidential Regression (DER) to construct a vanilla Evidential baseline. However, this approach encounters two major issues: the inability to effectively handle modality imbalance and the structural differences in DER's heuristic uncertainty regularizer, which adversely affect uncertainty estimation. This misalignment results in high uncertainty being incorrectly associated with accurate samples rather than challenging ones. Our observations indicate that existing methods lack the adaptability required for complex video scenarios. In response, we propose Debiased Evidential Learning for Moment Retrieval (DEMR), a novel framework that incorporates a Reflective Flipped Fusion (RFF) block for cross-modal alignment and a query reconstruction task to enhance text sensitivity, thereby reducing bias in uncertainty estimation. Additionally, we introduce a Geom-regularizer to refine uncertainty predictions, enabling adaptive alignment with difficult moments and improving retrieval accuracy. Extensive testing on standard datasets and debiased datasets ActivityNet-CD and Charades-CD demonstrates significant enhancements in effectiveness, robustness, and interpretability, positioning our approach as a promising solution for temporal-semantic robustness in moment retrieval. The code is publicly available at https://github.com/KaijingOfficial/DEMR.

LGAug 22, 2024
Multi-Task Curriculum Graph Contrastive Learning with Clustering Entropy Guidance

Chusheng Zeng, Bocheng Wang, Jinghui Yuan et al.

Recent advances in unsupervised deep graph clustering have been significantly promoted by contrastive learning. Despite the strides, most graph contrastive learning models face challenges: 1) graph augmentation is used to improve learning diversity, but commonly used random augmentation methods may destroy inherent semantics and cause noise; 2) the fixed positive and negative sample selection strategy is limited to deal with complex real data, thereby impeding the model's capability to capture fine-grained patterns and relationships. To reduce these problems, we propose the Clustering-guided Curriculum Graph contrastive Learning (CCGL) framework. CCGL uses clustering entropy as the guidance of the following graph augmentation and contrastive learning. Specifically, according to the clustering entropy, the intra-class edges and important features are emphasized in augmentation. Then, a multi-task curriculum learning scheme is proposed, which employs the clustering guidance to shift the focus from the discrimination task to the clustering task. In this way, the sample selection strategy of contrastive learning can be adjusted adaptively from early to late stage, which enhances the model's flexibility for complex data structure. Experimental results demonstrate that CCGL has achieved excellent performance compared to state-of-the-art competitors.

CVNov 17, 2023
Traffic Sign Interpretation in Real Road Scene

Chuang Yang, Kai Zhuang, Mulin Chen et al.

Most existing traffic sign-related works are dedicated to detecting and recognizing part of traffic signs individually, which fails to analyze the global semantic logic among signs and may convey inaccurate traffic instruction. Following the above issues, we propose a traffic sign interpretation (TSI) task, which aims to interpret global semantic interrelated traffic signs (e.g.,~driving instruction-related texts, symbols, and guide panels) into a natural language for providing accurate instruction support to autonomous or assistant driving. Meanwhile, we design a multi-task learning architecture for TSI, which is responsible for detecting and recognizing various traffic signs and interpreting them into a natural language like a human. Furthermore, the absence of a public TSI available dataset prompts us to build a traffic sign interpretation dataset, namely TSI-CN. The dataset consists of real road scene images, which are captured from the highway and the urban way in China from a driver's perspective. It contains rich location labels of texts, symbols, and guide panels, and the corresponding natural language description labels. Experiments on TSI-CN demonstrate that the TSI task is achievable and the TSI architecture can interpret traffic signs from scenes successfully even if there is a complex semantic logic among signs. The TSI-CN dataset and the source code of the TSI architecture will be publicly available after the revision process.

CVApr 15, 2024Code
CREST: Cross-modal Resonance through Evidential Deep Learning for Enhanced Zero-Shot Learning

Haojian Huang, Xiaozhen Qiao, Zhuo Chen et al.

Zero-shot learning (ZSL) enables the recognition of novel classes by leveraging semantic knowledge transfer from known to unknown categories. This knowledge, typically encapsulated in attribute descriptions, aids in identifying class-specific visual features, thus facilitating visual-semantic alignment and improving ZSL performance. However, real-world challenges such as distribution imbalances and attribute co-occurrence among instances often hinder the discernment of local variances in images, a problem exacerbated by the scarcity of fine-grained, region-specific attribute annotations. Moreover, the variability in visual presentation within categories can also skew attribute-category associations. In response, we propose a bidirectional cross-modal ZSL approach CREST. It begins by extracting representations for attribute and visual localization and employs Evidential Deep Learning (EDL) to measure underlying epistemic uncertainty, thereby enhancing the model's resilience against hard negatives. CREST incorporates dual learning pathways, focusing on both visual-category and attribute-category alignments, to ensure robust correlation between latent and observable spaces. Moreover, we introduce an uncertainty-informed cross-modal fusion technique to refine visual-attribute inference. Extensive experiments demonstrate our model's effectiveness and unique explainability across multiple datasets. Our code and data are available at: https://github.com/JethroJames/CREST

LGSep 10, 2024
Towards Robust Uncertainty-Aware Incomplete Multi-View Classification

Mulin Chen, Haojian Huang, Qiang Li

Handling incomplete data in multi-view classification is challenging, especially when traditional imputation methods introduce biases that compromise uncertainty estimation. Existing Evidential Deep Learning (EDL) based approaches attempt to address these issues, but they often struggle with conflicting evidence due to the limitations of the Dempster-Shafer combination rule, leading to unreliable decisions. To address these challenges, we propose the Alternating Progressive Learning Network (APLN), specifically designed to enhance EDL-based methods in incomplete MVC scenarios. Our approach mitigates bias from corrupted observed data by first applying coarse imputation, followed by mapping the data to a latent space. In this latent space, we progressively learn an evidence distribution aligned with the target domain, incorporating uncertainty considerations through EDL. Additionally, we introduce a conflict-aware Dempster-Shafer combination rule (DSCR) to better handle conflicting evidence. By sampling from the learned distribution, we optimize the latent representations of missing views, reducing bias and enhancing decision-making robustness. Extensive experiments demonstrate that APLN, combined with DSCR, significantly outperforms traditional methods, particularly in environments characterized by high uncertainty and conflicting evidence, establishing it as a promising solution for incomplete multi-view classification.

LGFeb 25, 2024
Deep Contrastive Graph Learning with Clustering-Oriented Guidance

Mulin Chen, Bocheng Wang, Xuelong Li

Graph Convolutional Network (GCN) has exhibited remarkable potential in improving graph-based clustering. To handle the general clustering scenario without a prior graph, these models estimate an initial graph beforehand to apply GCN. Throughout the literature, we have witnessed that 1) most models focus on the initial graph while neglecting the original features. Therefore, the discriminability of the learned representation may be corrupted by a low-quality initial graph; 2) the training procedure lacks effective clustering guidance, which may lead to the incorporation of clustering-irrelevant information into the learned graph. To tackle these problems, the Deep Contrastive Graph Learning (DCGL) model is proposed for general data clustering. Specifically, we establish a pseudo-siamese network, which incorporates auto-encoder with GCN to emphasize both the graph structure and the original features. On this basis, feature-level contrastive learning is introduced to enhance the discriminative capacity, and the relationship between samples and centroids is employed as the clustering-oriented guidance. Afterward, a two-branch graph learning mechanism is designed to extract the local and global structural relationships, which are further embedded into a unified graph under the cluster-level contrastive guidance. Experimental results on several benchmark datasets demonstrate the superiority of DCGL against state-of-the-art algorithms.

LGMar 16, 2025
Towards Learnable Anchor for Deep Multi-View Clustering

Bocheng Wang, Chusheng Zeng, Mulin Chen et al.

Deep multi-view clustering incorporating graph learning has presented tremendous potential. Most methods encounter costly square time consumption w.r.t. data size. Theoretically, anchor-based graph learning can alleviate this limitation, but related deep models mainly rely on manual discretization approaches to select anchors, which indicates that 1) the anchors are fixed during model training and 2) they may deviate from the true cluster distribution. Consequently, the unreliable anchors may corrupt clustering results. In this paper, we propose the Deep Multi-view Anchor Clustering (DMAC) model that performs clustering in linear time. Concretely, the initial anchors are intervened by the positive-incentive noise sampled from Gaussian distribution, such that they can be optimized with a newly designed anchor learning loss, which promotes a clear relationship between samples and anchors. Afterwards, anchor graph convolution is devised to model the cluster structure formed by the anchors, and the mutual information maximization loss is built to provide cross-view clustering guidance. In this way, the learned anchors can better represent clusters. With the optimal anchors, the full sample graph is calculated to derive a discriminative embedding for clustering. Extensive experiments on several datasets demonstrate the superior performance and efficiency of DMAC compared to state-of-the-art competitors.

LGJul 25, 2025
Clustering-Oriented Generative Attribute Graph Imputation

Mulin Chen, Bocheng Wang, Jiaxin Zhong et al.

Attribute-missing graph clustering has emerged as a significant unsupervised task, where only attribute vectors of partial nodes are available and the graph structure is intact. The related models generally follow the two-step paradigm of imputation and refinement. However, most imputation approaches fail to capture class-relevant semantic information, leading to sub-optimal imputation for clustering. Moreover, existing refinement strategies optimize the learned embedding through graph reconstruction, while neglecting the fact that some attributes are uncorrelated with the graph. To remedy the problems, we establish the Clustering-oriented Generative Imputation with reliable Refinement (CGIR) model. Concretely, the subcluster distributions are estimated to reveal the class-specific characteristics precisely, and constrain the sampling space of the generative adversarial module, such that the imputation nodes are impelled to align with the correct clusters. Afterwards, multiple subclusters are merged to guide the proposed edge attention network, which identifies the edge-wise attributes for each class, so as to avoid the redundant attributes in graph reconstruction from disturbing the refinement of overall embedding. To sum up, CGIR splits attribute-missing graph clustering into the search and mergence of subclusters, which guides to implement node imputation and refinement within a unified framework. Extensive experiments prove the advantages of CGIR over state-of-the-art competitors.

CVNov 18, 2021
Adaptive Shrink-Mask for Text Detection

Chuang Yang, Mulin Chen, Yuan Yuan et al.

Existing real-time text detectors reconstruct text contours by shrink-masks directly, which simplifies the framework and can make the model run fast. However, the strong dependence on predicted shrink-masks leads to unstable detection results. Moreover, the discrimination of shrink-masks is a pixelwise prediction task. Supervising the network by shrink-masks only will lose much semantic context, which leads to the false detection of shrink-masks. To address these problems, we construct an efficient text detection network, Adaptive Shrink-Mask for Text Detection (ASMTD), which improves the accuracy during training and reduces the complexity of the inference process. At first, the Adaptive Shrink-Mask (ASM) is proposed to represent texts by shrink-masks and independent adaptive offsets. It weakens the coupling of texts to shrink-masks, which improves the robustness of detection results. Then, the Super-pixel Window (SPW) is designed to supervise the network. It utilizes the surroundings of each pixel to improve the reliability of predicted shrink-masks and does not appear during testing. In the end, a lightweight feature merging branch is constructed to reduce the computational cost. As demonstrated in the experiments, our method is superior to existing state-of-the-art (SOTA) methods in both detection accuracy and speed on multiple benchmarks.

CVMay 12, 2021
MT: Multi-Perspective Feature Learning Network for Scene Text Detection

Chuang Yang, Mulin Chen, Yuan Yuan et al.

Text detection, the key technology for understanding scene text, has become an attractive research topic. For detecting various scene texts, researchers propose plenty of detectors with different advantages: detection-based models enjoy fast detection speed, and segmentation-based algorithms are not limited by text shapes. However, for most intelligent systems, the detector needs to detect arbitrary-shaped texts with high speed and accuracy simultaneously. Thus, in this study, we design an efficient pipeline named as MT, which can detect adhesive arbitrary-shaped texts with only a single binary mask in the inference stage. This paper presents the contributions on three aspects: (1) a light-weight detection framework is designed to speed up the inference process while keeping high detection accuracy; (2) a multi-perspective feature module is proposed to learn more discriminative representations to segment the mask accurately; (3) a multi-factor constraints IoU minimization loss is introduced for training the proposed model. The effectiveness of MT is evaluated on four real-world scene text datasets, and it surpasses all the state-of-the-art competitors to a large extent.

CVApr 24, 2021
Spatial-Spectral Clustering with Anchor Graph for Hyperspectral Image

Qi Wang, Yanling Miao, Mulin Chen et al.

Hyperspectral image (HSI) clustering, which aims at dividing hyperspectral pixels into clusters, has drawn significant attention in practical applications. Recently, many graph-based clustering methods, which construct an adjacent graph to model the data relationship, have shown dominant performance. However, the high dimensionality of HSI data makes it hard to construct the pairwise adjacent graph. Besides, abundant spatial structures are often overlooked during the clustering procedure. In order to better handle the high dimensionality problem and preserve the spatial structures, this paper proposes a novel unsupervised approach called spatial-spectral clustering with anchor graph (SSCAG) for HSI data clustering. The SSCAG has the following contributions: 1) the anchor graph-based strategy is used to construct a tractable large graph for HSI data, which effectively exploits all data points and reduces the computational complexity; 2) a new similarity metric is presented to embed the spatial-spectral information into the combined adjacent graph, which can mine the intrinsic property structure of HSI data; 3) an effective neighbors assignment strategy is adopted in the optimization, which performs the singular value decomposition (SVD) on the adjacent graph to get solutions efficiently. Extensive experiments on three public HSI datasets show that the proposed SSCAG is competitive against the state-of-the-art approaches.

LGApr 11, 2021
Auto-weighted Multi-view Feature Selection with Graph Optimization

Qi Wang, Xu Jiang, Mulin Chen et al.

In this paper, we focus on the unsupervised multi-view feature selection which tries to handle high dimensional data in the field of multi-view learning. Although some graph-based methods have achieved satisfactory performance, they ignore the underlying data structure across different views. Besides, their pre-defined laplacian graphs are sensitive to the noises in the original data space, and fail to get the optimal neighbor assignment. To address the above problems, we propose a novel unsupervised multi-view feature selection model based on graph learning, and the contributions are threefold: (1) during the feature selection procedure, the consensus similarity graph shared by different views is learned. Therefore, the proposed model can reveal the data relationship from the feature subset. (2) a reasonable rank constraint is added to optimize the similarity matrix to obtain more accurate information; (3) an auto-weighted framework is presented to assign view weights adaptively, and an effective alternative iterative algorithm is proposed to optimize the problem. Experiments on various datasets demonstrate the superiority of the proposed method compared with the state-of-the-art methods.

CVApr 11, 2021
BiP-Net: Bidirectional Perspective Strategy based Arbitrary-Shaped Text Detection Network

Chuang Yang, Mulin Chen, Yuan Yuan et al.

Detecting irregular-shaped text instances is the main challenge for text detection. Existing approaches can be roughly divided into top-down and bottom-up perspective methods. The former encodes text contours into unified units, which always fails to fit highly curved text contours. The latter represents text instances by a number of local units, where the complicated network and post-processing lead to slow detection speed. In this paper, to detect arbitrary-shaped text instances with high detection accuracy and speed simultaneously, we propose a \textbf{Bi}directional \textbf{P}erspective strategy based \textbf{Net}work (BiP-Net). Specifically, a new text representation strategy is proposed to represent text contours from a top-down perspective, which can fit highly curved text contours effectively. Moreover, a contour connecting (CC) algorithm is proposed to avoid the information loss of text contours by rebuilding interval contours from a bottom-up perspective. The experimental results on MSRA-TD500, CTW1500, and ICDAR2015 datasets demonstrate the superiority of BiP-Net against several state-of-the-art methods.

CVMar 25, 2021
Spatial-spectral Hyperspectral Image Classification via Multiple Random Anchor Graphs Ensemble Learning

Yanling Miao, Qi Wang, Mulin Chen et al.

Graph-based semi-supervised learning methods, which deal well with the situation of limited labeled data, have shown dominant performance in practical applications. However, the high dimensionality of hyperspectral images (HSI) makes it hard to construct the pairwise adjacent graph. Besides, the fine spatial features that help improve the discriminability of the model are often overlooked. To handle the problems, this paper proposes a novel spatial-spectral HSI classification method via multiple random anchor graphs ensemble learning (RAGE). Firstly, the local binary pattern is adopted to extract the more descriptive features on each selected band, which preserves local structures and subtle changes of a region. Secondly, the adaptive neighbors assignment is introduced in the construction of anchor graph, to reduce the computational complexity. Finally, an ensemble model is built by utilizing multiple anchor graphs, such that the diversity of HSI is learned. Extensive experiments show that RAGE is competitive against the state-of-the-art approaches.

IVMar 24, 2021
Feature Weighted Non-negative Matrix Factorization

Mulin Chen, Maoguo Gong, Xuelong Li

Non-negative Matrix Factorization (NMF) is one of the most popular techniques for data representation and clustering, and has been widely used in machine learning and data analysis. NMF concentrates the features of each sample into a vector, and approximates it by the linear combination of basis vectors, such that the low-dimensional representations are achieved. However, in real-world applications, the features are usually with different importances. To exploit the discriminative features, some methods project the samples into the subspace with a transformation matrix, which disturbs the original feature attributes and neglects the diversity of samples. To alleviate the above problems, we propose the Feature weighted Non-negative Matrix Factorization (FNMF) in this paper. The salient properties of FNMF can be summarized as threefold: 1) it learns the weights of features adaptively according to their importances; 2) it utilizes multiple feature weighting components to preserve the diversity; 3) it can be solved efficiently with the suggested optimization algorithm. Performance on synthetic and real-world datasets demonstrate that the proposed method obtains the state-of-the-art performance.

IVMar 24, 2021
Entropy Minimizing Matrix Factorization

Mulin Chen, Xuelong Li

Nonnegative Matrix Factorization (NMF) is a widely-used data analysis technique, and has yielded impressive results in many real-world tasks. Generally, existing NMF methods represent each sample with several centroids, and find the optimal centroids by minimizing the sum of the approximation errors. However, the outliers deviating from the normal data distribution may have large residues, and then dominate the objective value seriously. In this study, an Entropy Minimizing Matrix Factorization framework (EMMF) is developed to tackle the above problem. Considering that the outliers are usually much less than the normal samples, a new entropy loss function is established for matrix factorization, which minimizes the entropy of the residue distribution and allows a few samples to have large approximation errors. In this way, the outliers do not affect the approximation of the normal samples. The multiplicative updating rules for EMMF are also designed, and the convergence is proved both theoretically and experimentally. In addition, a Graph regularized version of EMMF (G-EMMF) is also presented to deal with the complex data structure. Clustering results on various synthetic and real-world datasets demonstrate the reasonableness of the proposed models, and the effectiveness is also verified through the comparison with the state-of-the-arts.

CVMar 17, 2021
Multi-channel Deep Supervision for Crowd Counting

Bo Wei, Mulin Chen, Qi Wang et al.

Crowd counting is a task worth exploring in modern society because of its wide applications such as public safety and video monitoring. Many CNN-based approaches have been proposed to improve the accuracy of estimation, but there are some inherent issues affect the performance, such as overfitting and details lost caused by pooling layers. To tackle these problems, in this paper, we propose an effective network called MDSNet, which introduces a novel supervision framework called Multi-channel Deep Supervision (MDS). The MDS conducts channel-wise supervision on the decoder of the estimation model to help generate the density maps. To obtain the accurate supervision information of different channels, the MDSNet employs an auxiliary network called SupervisionNet (SN) to generate abundant supervision maps based on existing groundtruth. Besides the traditional density map supervision, we also use the SN to convert the dot annotations into continuous supervision information and conduct dot supervision in the MDSNet. Extensive experiments on several mainstream benchmarks show that the proposed MDSNet achieves competitive results and the MDS significantly improves the performance without changing the network structure.

CVNov 30, 2020
CM-Net: Concentric Mask based Arbitrary-Shaped Text Detection

Chuang Yang, Mulin Chen, Zhitong Xiong et al.

Recently fast arbitrary-shaped text detection has become an attractive research topic. However, most existing methods are non-real-time, which may fall short in intelligent systems. Although a few real-time text methods are proposed, the detection accuracy is far behind non-real-time methods. To improve the detection accuracy and speed simultaneously, we propose a novel fast and accurate text detection framework, namely CM-Net, which is constructed based on a new text representation method and a multi-perspective feature (MPF) module. The former can fit arbitrary-shaped text contours by concentric mask (CM) in an efficient and robust way. The latter encourages the network to learn more CM-related discriminative features from multiple perspectives and brings no extra computational cost. Benefiting the advantages of CM and MPF, the proposed CM-Net only needs to predict one CM of the text instance to rebuild the text contour and achieves the best balance between detection accuracy and speed compared with previous works. Moreover, to ensure that multi-perspective features are effectively learned, the multi-factor constraints loss is proposed. Extensive experiments demonstrate the proposed CM is efficient and robust to fit arbitrary-shaped text instances, and also validate the effectiveness of MPF and constraints loss for discriminative text features recognition. Furthermore, experimental results show that the proposed CM-Net is superior to existing state-of-the-art (SOTA) real-time text detection methods in both detection speed and accuracy on MSRA-TD500, CTW1500, Total-Text, and ICDAR2015 datasets.