Zhang Yi

CV
h-index23
28papers
1,780citations
Novelty52%
AI Score58

28 Papers

IVAug 30, 2024
MedDet: Generative Adversarial Distillation for Efficient Cervical Disc Herniation Detection

Zeyu Zhang, Nengmin Yi, Shengbo Tan et al.

Cervical disc herniation (CDH) is a prevalent musculoskeletal disorder that significantly impacts health and requires labor-intensive analysis from experts. Despite advancements in automated detection of medical imaging, two significant challenges hinder the real-world application of these methods. First, the computational complexity and resource demands present a significant gap for real-time application. Second, noise in MRI reduces the effectiveness of existing methods by distorting feature extraction. To address these challenges, we propose three key contributions: Firstly, we introduced MedDet, which leverages the multi-teacher single-student knowledge distillation for model compression and efficiency, meanwhile integrating generative adversarial training to enhance performance. Additionally, we customize the second-order nmODE to improve the model's resistance to noise in MRI. Lastly, we conducted comprehensive experiments on the CDH-1848 dataset, achieving up to a 5% improvement in mAP compared to previous methods. Our approach also delivers over 5 times faster inference speed, with approximately 67.8% reduction in parameters and 36.9% reduction in FLOPs compared to the teacher model. These advancements significantly enhance the performance and efficiency of automated CDH detection, demonstrating promising potential for future application in clinical practice. See project website https://steve-zeyu-zhang.github.io/MedDet

LGJul 14, 2022
Deep Dictionary Learning with An Intra-class Constraint

Xia Yuan, Jianping Gou, Baosheng Yu et al.

In recent years, deep dictionary learning (DDL)has attracted a great amount of attention due to its effectiveness for representation learning and visual recognition.~However, most existing methods focus on unsupervised deep dictionary learning, failing to further explore the category information.~To make full use of the category information of different samples, we propose a novel deep dictionary learning model with an intra-class constraint (DDLIC) for visual classification. Specifically, we design the intra-class compactness constraint on the intermediate representation at different levels to encourage the intra-class representations to be closer to each other, and eventually the learned representation becomes more discriminative.~Unlike the traditional DDL methods, during the classification stage, our DDLIC performs a layer-wise greedy optimization in a similar way to the training stage. Experimental results on four image datasets show that our method is superior to the state-of-the-art methods.

IVDec 28, 2024Code
SegKAN: High-Resolution Medical Image Segmentation with Long-Distance Dependencies

Shengbo Tan, Rundong Xue, Shipeng Luo et al.

Hepatic vessels in computed tomography scans often suffer from image fragmentation and noise interference, making it difficult to maintain vessel integrity and posing significant challenges for vessel segmentation. To address this issue, we propose an innovative model: SegKAN. First, we improve the conventional embedding module by adopting a novel convolutional network structure for image embedding, which smooths out image noise and prevents issues such as gradient explosion in subsequent stages. Next, we transform the spatial relationships between Patch blocks into temporal relationships to solve the problem of capturing positional relationships between Patch blocks in traditional Vision Transformer models. We conducted experiments on a Hepatic vessel dataset, and compared to the existing state-of-the-art model, the Dice score improved by 1.78%. These results demonstrate that the proposed new structure effectively enhances the segmentation performance of high-resolution extended objects. Code will be available at https://github.com/goblin327/SegKAN

CVDec 9, 2024Code
A Lightweight U-like Network Utilizing Neural Memory Ordinary Differential Equations for Slimming the Decoder

Quansong He, Xiaojun Yao, Jun Wu et al.

In recent years, advanced U-like networks have demonstrated remarkable performance in medical image segmentation tasks. However, their drawbacks, including excessive parameters, high computational complexity, and slow inference speed, pose challenges for practical implementation in scenarios with limited computational resources. Existing lightweight U-like networks have alleviated some of these problems, but they often have pre-designed structures and consist of inseparable modules, limiting their application scenarios. In this paper, we propose three plug-and-play decoders by employing different discretization methods of the neural memory Ordinary Differential Equations (nmODEs). These decoders integrate features at various levels of abstraction by processing information from skip connections and performing numerical operations on upward path. Through experiments on the PH2, ISIC2017, and ISIC2018 datasets, we embed these decoders into different U-like networks, demonstrating their effectiveness in significantly reducing the number of parameters and FLOPs while maintaining performance. In summary, the proposed discretized nmODEs decoders are capable of reducing the number of parameters by about 20% ~ 50% and FLOPs by up to 74%, while possessing the potential to adapt to all U-like networks. Our code is available at https://github.com/nayutayuki/Lightweight-nmODE-Decoders-For-U-like-networks.

CVJun 19, 2024Code
Strengthening Layer Interaction via Dynamic Layer Attention

Kaishen Wang, Xun Xia, Jian Liu et al.

In recent years, employing layer attention to enhance interaction among hierarchical layers has proven to be a significant advancement in building network structures. In this paper, we delve into the distinction between layer attention and the general attention mechanism, noting that existing layer attention methods achieve layer interaction on fixed feature maps in a static manner. These static layer attention methods limit the ability for context feature extraction among layers. To restore the dynamic context representation capability of the attention mechanism, we propose a Dynamic Layer Attention (DLA) architecture. The DLA comprises dual paths, where the forward path utilizes an improved recurrent neural network block, named Dynamic Sharing Unit (DSU), for context feature extraction. The backward path updates features using these shared context representations. Finally, the attention mechanism is applied to these dynamically refreshed feature maps among layers. Experimental results demonstrate the effectiveness of the proposed DLA architecture, outperforming other state-of-the-art methods in image recognition and object detection tasks. Additionally, the DSU block has been evaluated as an efficient plugin in the proposed DLA architecture.The code is available at https://github.com/tunantu/Dynamic-Layer-Attention.

CVNov 11, 2025
Class Incremental Medical Image Segmentation via Prototype-Guided Calibration and Dual-Aligned Distillation

Shengqian Zhu, Chengrong Yu, Qiang Wang et al.

Class incremental medical image segmentation (CIMIS) aims to preserve knowledge of previously learned classes while learning new ones without relying on old-class labels. However, existing methods 1) either adopt one-size-fits-all strategies that treat all spatial regions and feature channels equally, which may hinder the preservation of accurate old knowledge, 2) or focus solely on aligning local prototypes with global ones for old classes while overlooking their local representations in new data, leading to knowledge degradation. To mitigate the above issues, we propose Prototype-Guided Calibration Distillation (PGCD) and Dual-Aligned Prototype Distillation (DAPD) for CIMIS in this paper. Specifically, PGCD exploits prototype-to-feature similarity to calibrate class-specific distillation intensity in different spatial regions, effectively reinforcing reliable old knowledge and suppressing misleading information from old classes. Complementarily, DAPD aligns the local prototypes of old classes extracted from the current model with both global prototypes and local prototypes, further enhancing segmentation performance on old categories. Comprehensive evaluations on two widely used multi-organ segmentation benchmarks demonstrate that our method outperforms state-of-the-art methods, highlighting its robustness and generalization capabilities.

CVApr 28
QB-LIF: Learnable-Scale Quantized Burst Neurons for Efficient SNNs

Dewei Bai, Hongxiang Peng, Jiajun Mei et al.

Binary spike coding enables sparse and event-driven computation in spiking neural networks (SNNs), yet its 1-bit-per-timestep representation fundamentally limits information throughput. This bottleneck becomes increasingly restrictive in deep architectures under short simulation horizons. We propose the Quantized Burst-LIF (QB-LIF) neuron, which reformulates burst spiking as a saturated uniform quantization of membrane potentials with a learnable scale. Instead of relying on predefined multi-threshold structures, QB-LIF treats the quantization scale as a trainable parameter, allowing each layer to autonomously adapt its spiking resolution to the underlying membrane-potential statistics. To preserve hardware efficiency, we introduce an absorbable scale strategy that folds the learned quantized scale into synaptic weights during inference, maintaining a strict accumulate-only (AC) execution paradigm. To enable stable optimization in the discrete multi-level space, we further design ReLSG-ET, a rectified-linear surrogate gradient with exponential tails that sustains gradient flow across burst intervals. Extensive experiments on static (CIFAR-10/100, ImageNet) and event-driven (CIFAR10-DVS, DVS128-Gesture) benchmarks demonstrate that QB-LIF consistently outperforms binary and fixed-burst SNNs, achieving higher accuracy under ultra-low latency while preserving neuromorphic compatibility.

CVApr 27
Unconstrained Multi-view Human Pose Estimation with Algebraic Priors

Xiaolin Qin, Qianlei Wang, Jiacen Liu et al.

Recovering 3D human pose from multi-view imagery typically relies on precise camera calibration, which is often unavailable in real-world scenarios, thereby severely limiting the applicability of existing methods. To overcome this challenge, we propose an unconstrained framework that synergizes deep neural networks, algebraic priors, and temporal dynamics for uncalibrated multi-view human pose estimation. First, we introduce the Triangulation with Transformer Regressor (TTR), which reformulates classical triangulation into a data-driven token fusion process to bypass the dependency on explicit camera parameters. Second, to explicitly embed the inherent algebraic relations of the multi-view variety into the learning process, we propose the Gröbner basis Corrector (GC). This pioneering loss formulation enforces constraints derived from the multi-view variety to ensure the neural predictions strictly adhere to the laws of projective geometry. Finally, we devise the Temporal Equivariant Rectifier (TER), which exploits the equivariance property of human motion to impose temporal coherence and structural consistency, effectively mitigating scale ambiguity in uncalibrated settings. Extensive evaluations on standard benchmarks demonstrate that our framework establishes a new state-of-the-art for uncalibrated multi-view human pose estimation. Notably, our approach significantly closes the performance gap between calibration-free methods and fully calibrated oracles.

CVSep 18, 2025
Enhancing Feature Fusion of U-like Networks with Dynamic Skip Connections

Yue Cao, Quansong He, Kaishen Wang et al.

U-like networks have become fundamental frameworks in medical image segmentation through skip connections that bridge high-level semantics and low-level spatial details. Despite their success, conventional skip connections exhibit two key limitations: inter-feature constraints and intra-feature constraints. The inter-feature constraint refers to the static nature of feature fusion in traditional skip connections, where information is transmitted along fixed pathways regardless of feature content. The intra-feature constraint arises from the insufficient modeling of multi-scale feature interactions, thereby hindering the effective aggregation of global contextual information. To overcome these limitations, we propose a novel Dynamic Skip Connection (DSC) block that fundamentally enhances cross-layer connectivity through adaptive mechanisms. The DSC block integrates two complementary components. (1) Test-Time Training (TTT) module. This module addresses the inter-feature constraint by enabling dynamic adaptation of hidden representations during inference, facilitating content-aware feature refinement. (2) Dynamic Multi-Scale Kernel (DMSK) module. To mitigate the intra-feature constraint, this module adaptively selects kernel sizes based on global contextual cues, enhancing the network capacity for multi-scale feature integration. The DSC block is architecture-agnostic and can be seamlessly incorporated into existing U-like network structures. Extensive experiments demonstrate the plug-and-play effectiveness of the proposed DSC block across CNN-based, Transformer-based, hybrid CNN-Transformer, and Mamba-based U-like networks.

CVSep 5, 2025
Exploiting Unlabeled Structures through Task Consistency Training for Versatile Medical Image Segmentation

Shengqian Zhu, Jiafei Wu, Xiaogang Xu et al.

Versatile medical image segmentation (VMIS) targets the segmentation of multiple classes, while obtaining full annotations for all classes is often impractical due to the time and labor required. Leveraging partially labeled datasets (PLDs) presents a promising alternative; however, current VMIS approaches face significant class imbalance due to the unequal category distribution in PLDs. Existing methods attempt to address this by generating pseudo-full labels. Nevertheless, these typically require additional models and often result in potential performance degradation from label noise. In this work, we introduce a Task Consistency Training (TCT) framework to address class imbalance without requiring extra models. TCT includes a backbone network with a main segmentation head (MSH) for multi-channel predictions and multiple auxiliary task heads (ATHs) for task-specific predictions. By enforcing a consistency constraint between the MSH and ATH predictions, TCT effectively utilizes unlabeled anatomical structures. To avoid error propagation from low-consistency, potentially noisy data, we propose a filtering strategy to exclude such data. Additionally, we introduce a unified auxiliary uncertainty-weighted loss (UAUWL) to mitigate segmentation quality declines caused by the dominance of specific tasks. Extensive experiments on eight abdominal datasets from diverse clinical sites demonstrate our approach's effectiveness.

LGSep 3, 2025
A Differential Manifold Perspective and Universality Analysis of Continuous Attractors in Artificial Neural Networks

Shaoxin Tian, Hongkai Liu, Yuying Yang et al.

Continuous attractors are critical for information processing in both biological and artificial neural systems, with implications for spatial navigation, memory, and deep learning optimization. However, existing research lacks a unified framework to analyze their properties across diverse dynamical systems, limiting cross-architectural generalizability. This study establishes a novel framework from the perspective of differential manifolds to investigate continuous attractors in artificial neural networks. It verifies compatibility with prior conclusions, elucidates links between continuous attractor phenomena and eigenvalues of the local Jacobian matrix, and demonstrates the universality of singular value stratification in common classification models and datasets. These findings suggest continuous attractors may be ubiquitous in general neural networks, highlighting the need for a general theory, with the proposed framework offering a promising foundation given the close mathematical connection between eigenvalues and singular values.

CVJan 3, 2025
EAUWSeg: Eliminating annotation uncertainty in weakly-supervised medical image segmentation

Wang Lituan, Zhang Lei, Wang Yan et al.

Weakly-supervised medical image segmentation is gaining traction as it requires only rough annotations rather than accurate pixel-to-pixel labels, thereby reducing the workload for specialists. Although some progress has been made, there is still a considerable performance gap between the label-efficient methods and fully-supervised one, which can be attributed to the uncertainty nature of these weak labels. To address this issue, we propose a novel weak annotation method coupled with its learning framework EAUWSeg to eliminate the annotation uncertainty. Specifically, we first propose the Bounded Polygon Annotation (BPAnno) by simply labeling two polygons for a lesion. Then, the tailored learning mechanism that explicitly treat bounded polygons as two separated annotations is proposed to learn invariant feature by providing adversarial supervision signal for model training. Subsequently, a confidence-auxiliary consistency learner incorporates with a classification-guided confidence generator is designed to provide reliable supervision signal for pixels in uncertain region by leveraging the feature presentation consistency across pixels within the same category as well as class-specific information encapsulated in bounded polygons annotation. Experimental results demonstrate that EAUWSeg outperforms existing weakly-supervised segmentation methods. Furthermore, compared to fully-supervised counterparts, the proposed method not only delivers superior performance but also costs much less annotation workload. This underscores the superiority and effectiveness of our approach.

CVMay 15, 2024
Fourier Boundary Features Network with Wider Catchers for Glass Segmentation

Xiaolin Qin, Jiacen Liu, Qianlei Wang et al.

Glass largely blurs the boundary between the real world and the reflection. The special transmittance and reflectance quality have confused the semantic tasks related to machine vision. Therefore, how to clear the boundary built by glass, and avoid over-capturing features as false positive information in deep structure, matters for constraining the segmentation of reflection surface and penetrating glass. We proposed the Fourier Boundary Features Network with Wider Catchers (FBWC), which might be the first attempt to utilize sufficiently wide horizontal shallow branches without vertical deepening for guiding the fine granularity segmentation boundary through primary glass semantic information. Specifically, we designed the Wider Coarse-Catchers (WCC) for anchoring large area segmentation and reducing excessive extraction from a structural perspective. We embed fine-grained features by Cross Transpose Attention (CTA), which is introduced to avoid the incomplete area within the boundary caused by reflection noise. For excavating glass features and balancing high-low layers context, a learnable Fourier Convolution Controller (FCC) is proposed to regulate information integration robustly. The proposed method has been validated on three different public glass segmentation datasets. Experimental results reveal that the proposed method yields better segmentation performance compared with the state-of-the-art (SOTA) methods in glass image segmentation.

IVJan 3, 2021
RegNet: Self-Regulated Network for Image Classification

Jing Xu, Yu Pan, Xinglin Pan et al.

The ResNet and its variants have achieved remarkable successes in various computer vision tasks. Despite its success in making gradient flow through building blocks, the simple shortcut connection mechanism limits the ability of re-exploring new potentially complementary features due to the additive function. To address this issue, in this paper, we propose to introduce a regulator module as a memory mechanism to extract complementary features, which are further fed to the ResNet. In particular, the regulator module is composed of convolutional RNNs (e.g., Convolutional LSTMs or Convolutional GRUs), which are shown to be good at extracting Spatio-temporal information. We named the new regulated networks as RegNet. The regulator module can be easily implemented and appended to any ResNet architecture. We also apply the regulator module for improving the Squeeze-and-Excitation ResNet to show the generalization ability of our method. Experimental results on three image classification datasets have demonstrated the promising performance of the proposed architecture compared with the standard ResNet, SE-ResNet, and other state-of-the-art architectures.

NEFeb 24, 2018
IGD Indicator-based Evolutionary Algorithm for Many-objective Optimization Problems

Yanan Sun, Gary G. Yen, Zhang Yi

Inverted Generational Distance (IGD) has been widely considered as a reliable performance indicator to concurrently quantify the convergence and diversity of multi- and many-objective evolutionary algorithms. In this paper, an IGD indicator-based evolutionary algorithm for solving many-objective optimization problems (MaOPs) has been proposed. Specifically, the IGD indicator is employed in each generation to select the solutions with favorable convergence and diversity. In addition, a computationally efficient dominance comparison method is designed to assign the rank values of solutions along with three newly proposed proximity distance assignments. Based on these two designs, the solutions are selected from a global view by linear assignment mechanism to concern the convergence and diversity simultaneously. In order to facilitate the accuracy of the sampled reference points for the calculation of IGD indicator, we also propose an efficient decomposition-based nadir point estimation method for constructing the Utopian Pareto front which is regarded as the best approximate Pareto front for real-world MaOPs at the early stage of the evolution. To evaluate the performance, a series of experiments is performed on the proposed algorithm against a group of selected state-of-the-art many-objective optimization algorithms over optimization problems with $8$-, $15$-, and $20$-objective. Experimental results measured by the chosen performance metrics indicate that the proposed algorithm is very competitive in addressing MaOPs.

NEFeb 24, 2018
Improved Regularity Model-based EDA for Many-objective Optimization

Yanan Sun, Gary G. Yen, Zhang Yi

The performance of multi-objective evolutionary algorithms deteriorates appreciably in solving many-objective optimization problems which encompass more than three objectives. One of the known rationales is the loss of selection pressure which leads to the selected parents not generating promising offspring towards Pareto-optimal front with diversity. Estimation of distribution algorithms sample new solutions with a probabilistic model built from the statistics extracting over the existing solutions so as to mitigate the adverse impact of genetic operators. In this paper, an improved regularity-based estimation of distribution algorithm is proposed to effectively tackle unconstrained many-objective optimization problems. In the proposed algorithm, \emph{diversity repairing mechanism} is utilized to mend the areas where need non-dominated solutions with a closer proximity to the Pareto-optimal front. Then \emph{favorable solutions} are generated by the model built from the regularity of the solutions surrounding a group of representatives. These two steps collectively enhance the selection pressure which gives rise to the superior convergence of the proposed algorithm. In addition, dimension reduction technique is employed in the decision space to speed up the estimation search of the proposed algorithm. Finally, by assigning the Pareto-optimal solutions to the uniformly distributed reference vectors, a set of solutions with excellent diversity and convergence is obtained. To measure the performance, NSGA-III, GrEA, MOEA/D, HypE, MBN-EDA, and RM-MEDA are selected to perform comparison experiments over DTLZ and DTLZ$^-$ test suites with $3$-, $5$-, $8$-, $10$-, and $15$-objective. Experimental results quantified by the selected performance metrics reveal that the proposed algorithm shows considerable competitiveness in addressing unconstrained many-objective optimization problems.

NEDec 13, 2017
Evolving Unsupervised Deep Neural Networks for Learning Meaningful Representations

Yanan Sun, Gary G. Yen, Zhang Yi

Deep Learning (DL) aims at learning the \emph{meaningful representations}. A meaningful representation refers to the one that gives rise to significant performance improvement of associated Machine Learning (ML) tasks by replacing the raw data as the input. However, optimal architecture design and model parameter estimation in DL algorithms are widely considered to be intractable. Evolutionary algorithms are much preferable for complex and non-convex problems due to its inherent characteristics of gradient-free and insensitivity to local optimum. In this paper, we propose a computationally economical algorithm for evolving \emph{unsupervised deep neural networks} to efficiently learn \emph{meaningful representations}, which is very suitable in the current Big Data era where sufficient labeled data for training is often expensive to acquire. In the proposed algorithm, finding an appropriate architecture and the initialized parameter values for a ML task at hand is modeled by one computational efficient gene encoding approach, which is employed to effectively model the task with a large number of parameters. In addition, a local search strategy is incorporated to facilitate the exploitation search for further improving the performance. Furthermore, a small proportion labeled data is utilized during evolution search to guarantee the learnt representations to be meaningful. The performance of the proposed algorithm has been thoroughly investigated over classification tasks. Specifically, error classification rate on MNIST with $1.15\%$ is reached by the proposed algorithm consistently, which is a very promising result against state-of-the-art unsupervised DL algorithms.

CVSep 25, 2017
Deep Sparse Subspace Clustering

Xi Peng, Jiashi Feng, Shijie Xiao et al.

In this paper, we present a deep extension of Sparse Subspace Clustering, termed Deep Sparse Subspace Clustering (DSSC). Regularized by the unit sphere distribution assumption for the learned deep features, DSSC can infer a new data affinity matrix by simultaneously satisfying the sparsity principle of SSC and the nonlinearity given by neural networks. One of the appealing advantages brought by DSSC is: when original real-world data do not meet the class-specific linear subspace distribution assumption, DSSC can employ neural networks to make the assumption valid with its hierarchical nonlinear transformations. To the best of our knowledge, this is among the first deep learning based subspace clustering methods. Extensive experiments are conducted on four real-world datasets to show the proposed DSSC is significantly superior to 12 existing methods for subspace clustering.

CVFeb 26, 2015
Connections Between Nuclear Norm and Frobenius Norm Based Representations

Xi Peng, Canyi Lu, Zhang Yi et al.

A lot of works have shown that frobenius-norm based representation (FNR) is competitive to sparse representation and nuclear-norm based representation (NNR) in numerous tasks such as subspace clustering. Despite the success of FNR in experimental studies, less theoretical analysis is provided to understand its working mechanism. In this paper, we fill this gap by building the theoretical connections between FNR and NNR. More specially, we prove that: 1) when the dictionary can provide enough representative capacity, FNR is exactly NNR even though the data set contains the Gaussian noise, Laplacian noise, or sample-specified corruption, 2) otherwise, FNR and NNR are two solutions on the column space of the dictionary.

CVNov 17, 2014
Automatic Subspace Learning via Principal Coefficients Embedding

Xi Peng, Jiwen Lu, Zhang Yi et al.

In this paper, we address two challenging problems in unsupervised subspace learning: 1) how to automatically identify the feature dimension of the learned subspace (i.e., automatic subspace learning), and 2) how to learn the underlying subspace in the presence of Gaussian noise (i.e., robust subspace learning). We show that these two problems can be simultaneously solved by proposing a new method (called principal coefficients embedding, PCE). For a given data set $\mathbf{D}\in \mathds{R}^{m\times n}$, PCE recovers a clean data set $\mathbf{D}_{0}\in \mathds{R}^{m\times n}$ from $\mathbf{D}$ and simultaneously learns a global reconstruction relation $\mathbf{C}\in \mathbf{R}^{n\times n}$ of $\mathbf{D}_{0}$. By preserving $\mathbf{C}$ into an $m^{\prime}$-dimensional space, the proposed method obtains a projection matrix that can capture the latent manifold structure of $\mathbf{D}_{0}$, where $m^{\prime}\ll m$ is automatically determined by the rank of $\mathbf{C}$ with theoretical guarantees. PCE has three advantages: 1) it can automatically determine the feature dimension even though data are sampled from a union of multiple linear subspaces in presence of the Gaussian noise, 2) Although the objective function of PCE only considers the Gaussian noise, experimental results show that it is robust to the non-Gaussian noise (\textit{e.g.}, random pixel corruption) and real disguises, 3) Our method has a closed-form solution and can be calculated very fast. Extensive experimental results show the superiority of PCE on a range of databases with respect to the classification accuracy, robustness and efficiency.

CVOct 31, 2014
Symmetric low-rank representation for subspace clustering

Jie Chen, Haixian Zhang, Hua Mao et al.

We propose a symmetric low-rank representation (SLRR) method for subspace clustering, which assumes that a data set is approximately drawn from the union of multiple subspaces. The proposed technique can reveal the membership of multiple subspaces through the self-expressiveness property of the data. In particular, the SLRR method considers a collaborative representation combined with low-rank matrix recovery techniques as a low-rank representation to learn a symmetric low-rank representation, which preserves the subspace structures of high-dimensional data. In contrast to performing iterative singular value decomposition in some existing low-rank representation based algorithms, the symmetric low-rank representation in the SLRR method can be calculated as a closed form solution by solving the symmetric low-rank optimization problem. By making use of the angular information of the principal directions of the symmetric low-rank representation, an affinity graph matrix is constructed for spectral clustering. Extensive experimental results show that it outperforms state-of-the-art subspace clustering algorithms.

CVSep 22, 2014
Fast Low-rank Representation based Spatial Pyramid Matching for Image Classification

Xi Peng, Rui Yan, Bo Zhao et al.

Spatial Pyramid Matching (SPM) and its variants have achieved a lot of success in image classification. The main difference among them is their encoding schemes. For example, ScSPM incorporates Sparse Code (SC) instead of Vector Quantization (VQ) into the framework of SPM. Although the methods achieve a higher recognition rate than the traditional SPM, they consume more time to encode the local descriptors extracted from the image. In this paper, we propose using Low Rank Representation (LRR) to encode the descriptors under the framework of SPM. Different from SC, LRR considers the group effect among data points instead of sparsity. Benefiting from this property, the proposed method (i.e., LrrSPM) can offer a better performance. To further improve the generalizability and robustness, we reformulate the rank-minimization problem as a truncated projection problem. Extensive experimental studies show that LrrSPM is more efficient than its counterparts (e.g., ScSPM) while achieving competitive recognition rates on nine image data sets.

CVMar 7, 2014
Subspace clustering using a symmetric low-rank representation

Jie Chen, Hua Mao, Yongsheng Sang et al.

In this paper, we propose a low-rank representation with symmetric constraint (LRRSC) method for robust subspace clustering. Given a collection of data points approximately drawn from multiple subspaces, the proposed technique can simultaneously recover the dimension and members of each subspace. LRRSC extends the original low-rank representation algorithm by integrating a symmetric constraint into the low-rankness property of high-dimensional data representation. The symmetric low-rank representation, which preserves the subspace structures of high-dimensional data, guarantees weight consistency for each pair of data points so that highly correlated data points of subspaces are represented together. Moreover, it can be efficiently calculated by solving a convex optimization problem. We provide a rigorous proof for minimizing the nuclear-norm regularized least square problem with a symmetric constraint. The affinity matrix for spectral clustering can be obtained by further exploiting the angular information of the principal directions of the symmetric low-rank representation. This is a critical step towards evaluating the memberships between data points. Experimental results on benchmark databases demonstrate the effectiveness and robustness of LRRSC compared with several state-of-the-art subspace clustering algorithms.

LGSep 25, 2013
A Unified Framework for Representation-based Subspace Clustering of Out-of-sample and Large-scale Data

Xi Peng, Huajin Tang, Lei Zhang et al.

Under the framework of spectral clustering, the key of subspace clustering is building a similarity graph which describes the neighborhood relations among data points. Some recent works build the graph using sparse, low-rank, and $\ell_2$-norm-based representation, and have achieved state-of-the-art performance. However, these methods have suffered from the following two limitations. First, the time complexities of these methods are at least proportional to the cube of the data size, which make those methods inefficient for solving large-scale problems. Second, they cannot cope with out-of-sample data that are not used to construct the similarity graph. To cluster each out-of-sample datum, the methods have to recalculate the similarity graph and the cluster membership of the whole data set. In this paper, we propose a unified framework which makes representation-based subspace clustering algorithms feasible to cluster both out-of-sample and large-scale data. Under our framework, the large-scale problem is tackled by converting it as out-of-sample problem in the manner of "sampling, clustering, coding, and classifying". Furthermore, we give an estimation for the error bounds by treating each subspace as a point in a hyperspace. Extensive experimental results on various benchmark data sets show that our methods outperform several recently-proposed scalable methods in clustering large-scale data set.

LGApr 24, 2013
Locally linear representation for image clustering

Liangli Zhen, Zhang Yi, Xi Peng et al.

It is a key to construct a similarity graph in graph-oriented subspace learning and clustering. In a similarity graph, each vertex denotes a data point and the edge weight represents the similarity between two points. There are two popular schemes to construct a similarity graph, i.e., pairwise distance based scheme and linear representation based scheme. Most existing works have only involved one of the above schemes and suffered from some limitations. Specifically, pairwise distance based methods are sensitive to the noises and outliers compared with linear representation based methods. On the other hand, there is the possibility that linear representation based algorithms wrongly select inter-subspaces points to represent a point, which will degrade the performance. In this paper, we propose an algorithm, called Locally Linear Representation (LLR), which integrates pairwise distance with linear representation together to address the problems. The proposed algorithm can automatically encode each data point over a set of points that not only could denote the objective point with less residual error, but also are close to the point in Euclidean space. The experimental results show that our approach is promising in subspace learning and subspace clustering.

LGMar 2, 2013
Inductive Sparse Subspace Clustering

Xi Peng, Lei Zhang, Zhang Yi

Sparse Subspace Clustering (SSC) has achieved state-of-the-art clustering quality by performing spectral clustering over a $\ell^{1}$-norm based similarity graph. However, SSC is a transductive method which does not handle with the data not used to construct the graph (out-of-sample data). For each new datum, SSC requires solving $n$ optimization problems in O(n) variables for performing the algorithm over the whole data set, where $n$ is the number of data points. Therefore, it is inefficient to apply SSC in fast online clustering and scalable graphing. In this letter, we propose an inductive spectral clustering algorithm, called inductive Sparse Subspace Clustering (iSSC), which makes SSC feasible to cluster out-of-sample data. iSSC adopts the assumption that high-dimensional data actually lie on the low-dimensional manifold such that out-of-sample data could be grouped in the embedding space learned from in-sample data. Experimental results show that iSSC is promising in clustering out-of-sample data.

CVOct 4, 2012
Learning Locality-Constrained Collaborative Representation for Face Recognition

Xi Peng, Lei Zhang, Zhang Yi et al.

The model of low-dimensional manifold and sparse representation are two well-known concise models that suggest each data can be described by a few characteristics. Manifold learning is usually investigated for dimension reduction by preserving some expected local geometric structures from the original space to a low-dimensional one. The structures are generally determined by using pairwise distance, e.g., Euclidean distance. Alternatively, sparse representation denotes a data point as a linear combination of the points from the same subspace. In practical applications, however, the nearby points in terms of pairwise distance may not belong to the same subspace, and vice versa. Consequently, it is interesting and important to explore how to get a better representation by integrating these two models together. To this end, this paper proposes a novel coding algorithm, called Locality-Constrained Collaborative Representation (LCCR), which improves the robustness and discrimination of data representation by introducing a kind of local consistency. The locality term derives from a biologic observation that the similar inputs have similar code. The objective function of LCCR has an analytical solution, and it does not involve local minima. The empirical studies based on four public facial databases, ORL, AR, Extended Yale B, and Multiple PIE, show that LCCR is promising in recognizing human faces from frontal views with varying expression and illumination, as well as various corruptions and occlusions.

CVSep 5, 2012
Constructing the L2-Graph for Robust Subspace Learning and Subspace Clustering

Xi Peng, Zhiding Yu, Huajin Tang et al.

Under the framework of graph-based learning, the key to robust subspace clustering and subspace learning is to obtain a good similarity graph that eliminates the effects of errors and retains only connections between the data points from the same subspace (i.e., intra-subspace data points). Recent works achieve good performance by modeling errors into their objective functions to remove the errors from the inputs. However, these approaches face the limitations that the structure of errors should be known prior and a complex convex problem must be solved. In this paper, we present a novel method to eliminate the effects of the errors from the projection space (representation) rather than from the input space. We first prove that $\ell_1$-, $\ell_2$-, $\ell_{\infty}$-, and nuclear-norm based linear projection spaces share the property of Intra-subspace Projection Dominance (IPD), i.e., the coefficients over intra-subspace data points are larger than those over inter-subspace data points. Based on this property, we introduce a method to construct a sparse similarity graph, called L2-Graph. The subspace clustering and subspace learning algorithms are developed upon L2-Graph. Experiments show that L2-Graph algorithms outperform the state-of-the-art methods for feature extraction, image clustering, and motion segmentation in terms of accuracy, robustness, and time efficiency.