Huazhong Shu

CV
h-index31
30papers
663citations
Novelty43%
AI Score30

30 Papers

CVJun 15, 2022Code
XMorpher: Full Transformer for Deformable Medical Image Registration via Cross Attention

Jiacheng Shi, Yuting He, Youyong Kong et al.

An effective backbone network is important to deep learning-based Deformable Medical Image Registration (DMIR), because it extracts and matches the features between two images to discover the mutual correspondence for fine registration. However, the existing deep networks focus on single image situation and are limited in registration task which is performed on paired images. Therefore, we advance a novel backbone network, XMorpher, for the effective corresponding feature representation in DMIR. 1) It proposes a novel full transformer architecture including dual parallel feature extraction networks which exchange information through cross attention, thus discovering multi-level semantic correspondence while extracting respective features gradually for final effective registration. 2) It advances the Cross Attention Transformer (CAT) blocks to establish the attention mechanism between images which is able to find the correspondence automatically and prompts the features to fuse efficiently in the network. 3) It constrains the attention computation between base windows and searching windows with different sizes, and thus focuses on the local transformation of deformable registration and enhances the computing efficiency at the same time. Without any bells and whistles, our XMorpher gives Voxelmorph 2.8% improvement on DSC , demonstrating its effective representation of the features from the paired images in DMIR. We believe that our XMorpher has great application potential in more paired medical images. Our XMorpher is open on https://github.com/Solemoon/XMorpher

IVMay 10, 2022Code
MNet: Rethinking 2D/3D Networks for Anisotropic Medical Image Segmentation

Zhangfu Dong, Yuting He, Xiaoming Qi et al.

The nature of thick-slice scanning causes severe inter-slice discontinuities of 3D medical images, and the vanilla 2D/3D convolutional neural networks (CNNs) fail to represent sparse inter-slice information and dense intra-slice information in a balanced way, leading to severe underfitting to inter-slice features (for vanilla 2D CNNs) and overfitting to noise from long-range slices (for vanilla 3D CNNs). In this work, a novel mesh network (MNet) is proposed to balance the spatial representation inter axes via learning. 1) Our MNet latently fuses plenty of representation processes by embedding multi-dimensional convolutions deeply into basic modules, making the selections of representation processes flexible, thus balancing representation for sparse inter-slice information and dense intra-slice information adaptively. 2) Our MNet latently fuses multi-dimensional features inside each basic module, simultaneously taking the advantages of 2D (high segmentation accuracy of the easily recognized regions in 2D view) and 3D (high smoothness of 3D organ contour) representations, thus obtaining more accurate modeling for target regions. Comprehensive experiments are performed on four public datasets (CT\&MR), the results consistently demonstrate the proposed MNet outperforms the other methods. The code and datasets are available at: https://github.com/zfdong-code/MNet

NAFeb 24, 2012
L1-norm minimization for quaternion signals

Jiasong Wu, Xu Zhang, Xiaoqing Wang et al.

The l1-norm minimization problem plays an important role in the compressed sensing (CS) theory. We present in this letter an algorithm for solving the problem of l1-norm minimization for quaternion signals by converting it to second-order cone programming. An application example of the proposed algorithm is also given for practical guidelines of perfect recovery of quaternion signals. The proposed algorithm may find its potential application when CS theory meets the quaternion signal processing.

CVMar 13, 2024Code
Multiscale Low-Frequency Memory Network for Improved Feature Extraction in Convolutional Neural Networks

Fuzhi Wu, Jiasong Wu, Youyong Kong et al.

Deep learning and Convolutional Neural Networks (CNNs) have driven major transformations in diverse research areas. However, their limitations in handling low-frequency information present obstacles in certain tasks like interpreting global structures or managing smooth transition images. Despite the promising performance of transformer structures in numerous tasks, their intricate optimization complexities highlight the persistent need for refined CNN enhancements using limited resources. Responding to these complexities, we introduce a novel framework, the Multiscale Low-Frequency Memory (MLFM) Network, with the goal to harness the full potential of CNNs while keeping their complexity unchanged. The MLFM efficiently preserves low-frequency information, enhancing performance in targeted computer vision tasks. Central to our MLFM is the Low-Frequency Memory Unit (LFMU), which stores various low-frequency data and forms a parallel channel to the core network. A key advantage of MLFM is its seamless compatibility with various prevalent networks, requiring no alterations to their original core structure. Testing on ImageNet demonstrated substantial accuracy improvements in multiple 2D CNNs, including ResNet, MobileNet, EfficientNet, and ConvNeXt. Furthermore, we showcase MLFM's versatility beyond traditional image classification by successfully integrating it into image-to-image translation tasks, specifically in semantic segmentation networks like FCN and U-Net. In conclusion, our work signifies a pivotal stride in the journey of optimizing the efficacy and efficiency of CNNs with limited resources. This research builds upon the existing CNN foundations and paves the way for future advancements in computer vision. Our codes are available at https://github.com/AlphaWuSeu/ MLFM.

ASOct 30, 2021Code
Self-Supervised Speech Denoising Using Only Noisy Audio Signals

Jiasong Wu, Qingchun Li, Guanyu Yang et al.

In traditional speech denoising tasks, clean audio signals are often used as the training target, but absolutely clean signals are collected from expensive recording equipment or in studios with the strict environments. To overcome this drawback, we propose an end-to-end self-supervised speech denoising training scheme using only noisy audio signals, named Only-Noisy Training (ONT), without extra training conditions. The proposed ONT strategy constructs training pairs only from each single noisy audio, and it contains two modules: training audio pairs generated module and speech denoising module. The first module adopts a random audio sub-sampler on each noisy audio to generate training pairs. The sub-sampled pairs are then fed into a novel complex-valued speech denoising module. Experimental results show that the proposed method not only eliminates the high dependence on clean targets of traditional audio denoising tasks, but also achieves on-par or better performance than other training strategies. Availability-ONT is available at https://github.com/liqingchunnnn/Only-Noisy-Training

ASJul 21, 2020Code
CSLNSpeech: solving extended speech separation problem with the help of Chinese sign language

Jiasong Wu, Xuan Li, Taotao Li et al.

Previous audio-visual speech separation methods use the synchronization of the speaker's facial movement and speech in the video to supervise the speech separation in a self-supervised way. In this paper, we propose a model to solve the speech separation problem assisted by both face and sign language, which we call the extended speech separation problem. We design a general deep learning network for learning the combination of three modalities, audio, face, and sign language information, for better solving the speech separation problem. To train the model, we introduce a large-scale dataset named the Chinese Sign Language News Speech (CSLNSpeech) dataset, in which three modalities of audio, face, and sign language coexist. Experiment results show that the proposed model has better performance and robustness than the usual audio-visual system. Besides, sign language modality can also be used alone to supervise speech separation tasks, and the introduction of sign language is helpful for hearing-impaired people to learn and communicate. Last, our model is a general speech separation framework and can achieve very competitive separation performance on two open-source audio-visual datasets. The code is available at https://github.com/iveveive/SLNSpeech

CVMar 15, 2024
ST-LDM: A Universal Framework for Text-Grounded Object Generation in Real Images

Xiangtian Xue, Jiasong Wu, Youyong Kong et al.

We present a novel image editing scenario termed Text-grounded Object Generation (TOG), defined as generating a new object in the real image spatially conditioned by textual descriptions. Existing diffusion models exhibit limitations of spatial perception in complex real-world scenes, relying on additional modalities to enforce constraints, and TOG imposes heightened challenges on scene comprehension under the weak supervision of linguistic information. We propose a universal framework ST-LDM based on Swin-Transformer, which can be integrated into any latent diffusion model with training-free backward guidance. ST-LDM encompasses a global-perceptual autoencoder with adaptable compression scales and hierarchical visual features, parallel with deformable multimodal transformer to generate region-wise guidance for the subsequent denoising process. We transcend the limitation of traditional attention mechanisms that only focus on existing visual features by introducing deformable feature alignment to hierarchically refine spatial positioning fused with multi-scale visual and linguistic information. Extensive Experiments demonstrate that our model enhances the localization of attention mechanisms while preserving the generative capabilities inherent to diffusion models.

IVFeb 5, 2024
Beyond Strong labels: Weakly-supervised Learning Based on Gaussian Pseudo Labels for The Segmentation of Ellipse-like Vascular Structures in Non-contrast CTs

Qixiang Ma, Antoine Łucas, Huazhong Shu et al.

Deep-learning-based automated segmentation of vascular structures in preoperative CT scans contributes to computer-assisted diagnosis and intervention procedure in vascular diseases. While CT angiography (CTA) is the common standard, non-contrast CT imaging is significant as a contrast-risk-free alternative, avoiding complications associated with contrast agents. However, the challenges of labor-intensive labeling and high labeling variability due to the ambiguity of vascular boundaries hinder conventional strong-label-based, fully-supervised learning in non-contrast CTs. This paper introduces a weakly-supervised framework using ellipses' topology in slices, including 1) an efficient annotation process based on predefined standards, 2) ellipse-fitting processing, 3) the generation of 2D Gaussian heatmaps serving as pseudo labels, 4) a training process through a combination of voxel reconstruction loss and distribution loss with the pseudo labels. We assess the effectiveness of the proposed method on one local and two public datasets comprising non-contrast CT scans, particularly focusing on the abdominal aorta. On the local dataset, our weakly-supervised learning approach based on pseudo labels outperforms strong-label-based fully-supervised learning (1.54\% of Dice score on average), reducing labeling time by around 82.0\%. The efficiency in generating pseudo labels allows the inclusion of label-agnostic external data in the training set, leading to an additional improvement in performance (2.74\% of Dice score on average) with a reduction of 66.3\% labeling time, where the labeling time remains considerably less than that of strong labels. On the public dataset, the pseudo labels achieve an overall improvement of 1.95\% in Dice score for 2D models while a reduction of 11.65 voxel spacing in Hausdorff distance for 3D model.

CVMar 14, 2024
Rethinking Referring Object Removal

Xiangtian Xue, Jiasong Wu, Youyong Kong et al.

Referring object removal refers to removing the specific object in an image referred by natural language expressions and filling the missing region with reasonable semantics. To address this task, we construct the ComCOCO, a synthetic dataset consisting of 136,495 referring expressions for 34,615 objects in 23,951 image pairs. Each pair contains an image with referring expressions and the ground truth after elimination. We further propose an end-to-end syntax-aware hybrid mapping network with an encoding-decoding structure. Linguistic features are hierarchically extracted at the syntactic level and fused in the downsampling process of visual features with multi-head attention. The feature-aligned pyramid network is leveraged to generate segmentation masks and replace internal pixels with region affinity learned from external semantics in high-level feature maps. Extensive experiments demonstrate that our model outperforms diffusion models and two-stage methods which process the segmentation and inpainting task separately by a significant margin.

IVJun 8, 2021
EnMcGAN: Adversarial Ensemble Learning for 3D Complete Renal Structures Segmentation

Yuting He, Rongjun Ge, Xiaoming Qi et al.

3D complete renal structures(CRS) segmentation targets on segmenting the kidneys, tumors, renal arteries and veins in one inference. Once successful, it will provide preoperative plans and intraoperative guidance for laparoscopic partial nephrectomy(LPN), playing a key role in the renal cancer treatment. However, no success has been reported in 3D CRS segmentation due to the complex shapes of renal structures, low contrast and large anatomical variation. In this study, we utilize the adversarial ensemble learning and propose Ensemble Multi-condition GAN(EnMcGAN) for 3D CRS segmentation for the first time. Its contribution is three-fold. 1)Inspired by windowing, we propose the multi-windowing committee which divides CTA image into multiple narrow windows with different window centers and widths enhancing the contrast for salient boundaries and soft tissues. And then, it builds an ensemble segmentation model on these narrow windows to fuse the segmentation superiorities and improve whole segmentation quality. 2)We propose the multi-condition GAN which equips the segmentation model with multiple discriminators to encourage the segmented structures meeting their real shape conditions, thus improving the shape feature extraction ability. 3)We propose the adversarial weighted ensemble module which uses the trained discriminators to evaluate the quality of segmented structures, and normalizes these evaluation scores for the ensemble weights directed at the input image, thus enhancing the ensemble results. 122 patients are enrolled in this study and the mean Dice coefficient of the renal structures achieves 84.6%. Extensive experiments with promising results on renal structures reveal powerful segmentation accuracy and great clinical significance in renal cancer treatment.

CVAug 3, 2020
Deep Complementary Joint Model for Complex Scene Registration and Few-shot Segmentation on Medical Images

Yuting He, Tiantian Li, Guanyu Yang et al.

Deep learning-based medical image registration and segmentation joint models utilize the complementarity (augmentation data or weakly supervised data from registration, region constraints from segmentation) to bring mutual improvement in complex scene and few-shot situation. However, further adoption of the joint models are hindered: 1) the diversity of augmentation data is reduced limiting the further enhancement of segmentation, 2) misaligned regions in weakly supervised data disturb the training process, 3) lack of label-based region constraints in few-shot situation limits the registration performance. We propose a novel Deep Complementary Joint Model (DeepRS) for complex scene registration and few-shot segmentation. We embed a perturbation factor in the registration to increase the activity of deformation thus maintaining the augmentation data diversity. We take a pixel-wise discriminator to extract alignment confidence maps which highlight aligned regions in weakly supervised data so the misaligned regions' disturbance will be suppressed via weighting. The outputs from segmentation model are utilized to implement deep-based region constraints thus relieving the label requirements and bringing fine registration. Extensive experiments on the CT dataset of MM-WHS 2017 Challenge show great advantages of our DeepRS that outperforms the existing state-of-the-art models.

CVJul 28, 2020
Generative networks as inverse problems with fractional wavelet scattering networks

Jiasong Wu, Jing Zhang, Fuzhi Wu et al.

Deep learning is a hot research topic in the field of machine learning methods and applications. Generative Adversarial Networks (GANs) and Variational Auto-Encoders (VAEs) provide impressive image generations from Gaussian white noise, but both of them are difficult to train since they need to train the generator (or encoder) and the discriminator (or decoder) simultaneously, which is easy to cause unstable training. In order to solve or alleviate the synchronous training difficult problems of GANs and VAEs, recently, researchers propose Generative Scattering Networks (GSNs), which use wavelet scattering networks (ScatNets) as the encoder to obtain the features (or ScatNet embeddings) and convolutional neural networks (CNNs) as the decoder to generate the image. The advantage of GSNs is the parameters of ScatNets are not needed to learn, and the disadvantage of GSNs is that the expression ability of ScatNets is slightly weaker than CNNs and the dimensional reduction method of Principal Component Analysis (PCA) is easy to lead overfitting in the training of GSNs, and therefore affect the generated quality in the testing process. In order to further improve the quality of generated images while keep the advantages of GSNs, this paper proposes Generative Fractional Scattering Networks (GFRSNs), which use more expressive fractional wavelet scattering networks (FrScatNets) instead of ScatNets as the encoder to obtain the features (or FrScatNet embeddings) and use the similar CNNs of GSNs as the decoder to generate the image. Additionally, this paper develops a new dimensional reduction method named Feature-Map Fusion (FMF) instead of PCA for better keeping the information of FrScatNets and the effect of image fusion on the quality of image generation is also discussed.

CVMar 8, 2020
Trajectory Grouping with Curvature Regularization for Tubular Structure Tracking

Li Liu, Da Chen, Minglei Shu et al.

Tubular structure tracking is a crucial task in the fields of computer vision and medical image analysis. The minimal paths-based approaches have exhibited their strong ability in tracing tubular structures, by which a tubular structure can be naturally modeled as a minimal geodesic path computed with a suitable geodesic metric. However, existing minimal paths-based tracing approaches still suffer from difficulties such as the shortcuts and short branches combination problems, especially when dealing with the images involving complicated tubular tree structures or background. In this paper, we introduce a new minimal paths-based model for minimally interactive tubular structure centerline extraction in conjunction with a perceptual grouping scheme. Basically, we take into account the prescribed tubular trajectories and curvature-penalized geodesic paths to seek suitable shortest paths. The proposed approach can benefit from the local smoothness prior on tubular structures and the global optimality of the used graph-based path searching scheme. Experimental results on both synthetic and real images prove that the proposed model indeed obtains outperformance comparing with the state-of-the-art minimal paths-based tubular structure tracing algorithms.

CVDec 20, 2019
A Region-based Randers Geodesic Approach for Image Segmentation

Da Chen, Jean-Marie Mirebeau, Huazhong Shu et al.

The geodesic model based on the eikonal partial differential equation (PDE) has served as a fundamental tool for the applications of image segmentation and boundary detection in the past two decades. However, the existing approaches commonly only exploit the image edge-based features for computing minimal geodesic paths, potentially limiting their performance in complicated segmentation situations. In this paper, we introduce a new variational image segmentation model based on the minimal geodesic path framework and the eikonal PDE, where the region-based appearance term that defines then regional homogeneity features can be taken into account for estimating the associated minimal geodesic paths. This is done by constructing a Randers geodesic metric interpretation of the region-based active contour energy functional. As a result, the minimization of the active contour energy functional is transformed into finding the solution to the Randers eikonal PDE. We also suggest a practical interactive image segmentation strategy, where the target boundary can be delineated by the concatenation of several piecewise geodesic paths. We invoke the Finsler variant of the fast marching method to estimate the geodesic distance map, yielding an efficient implementation of the proposed region-based Randers geodesic model for image segmentation. Experimental results on both synthetic and real images exhibit that our model indeed achieves encouraging segmentation performance.

CVMar 20, 2019
Deep Octonion Networks

Jiasong Wu, Ling Xu, Youyong Kong et al.

Deep learning is a research hot topic in the field of machine learning. Real-value neural networks (Real NNs), especially deep real networks (DRNs), have been widely used in many research fields. In recent years, the deep complex networks (DCNs) and the deep quaternion networks (DQNs) have attracted more and more attentions. The octonion algebra, which is an extension of complex algebra and quaternion algebra, can provide more efficient and compact expression. This paper constructs a general framework of deep octonion networks (DONs) and provides the main building blocks of DONs such as octonion convolution, octonion batch normalization and octonion weight initialization; DONs are then used in image classification tasks for CIFAR-10 and CIFAR-100 data sets. Compared with the DRNs, the DCNs, and the DQNs, the proposed DONs have better convergence and higher classification accuracy. The success of DONs is also explained by multi-task learning.

CVMar 6, 2019
Compressing complex convolutional neural network based on an improved deep compression algorithm

Jiasong Wu, Hongshan Ren, Youyong Kong et al.

Although convolutional neural network (CNN) has made great progress, large redundant parameters restrict its deployment on embedded devices, especially mobile devices. The recent compression works are focused on real-value convolutional neural network (Real CNN), however, to our knowledge, there is no attempt for the compression of complex-value convolutional neural network (Complex CNN). Compared with the real-valued network, the complex-value neural network is easier to optimize, generalize, and has better learning potential. This paper extends the commonly used deep compression algorithm from real domain to complex domain and proposes an improved deep compression algorithm for the compression of Complex CNN. The proposed algorithm compresses the network about 8 times on CIFAR-10 dataset with less than 3% accuracy loss. On the ImageNet dataset, our method compresses the model about 16 times and the accuracy loss is about 2% without retraining.

CVFeb 27, 2019
Fractional spectral graph wavelets and their applications

Jiasong Wu, Fuzhi Wu, Qihan Yang et al.

One of the key challenges in the area of signal processing on graphs is to design transforms and dictionaries methods to identify and exploit structure in signals on weighted graphs. In this paper, we first generalize graph Fourier transform (GFT) to graph fractional Fourier transform (GFRFT), which is then used to define a novel transform named spectral graph fractional wavelet transform (SGFRWT), which is a generalized and extended version of spectral graph wavelet transform (SGWT). A fast algorithm for SGFRWT is also derived and implemented based on Fourier series approximation. The potential applications of SGFRWT are also presented.

CVFeb 27, 2019
Modulated binary cliquenet

Jinpeng Xia, Jiasong Wu, Youyong Kong et al.

Although Convolutional Neural Networks (CNNs) achieve effectiveness in various computer vision tasks, the significant requirement of storage of such networks hinders the deployment on computationally limited devices. In this paper, we propose a new compact and portable deep learning network named Modulated Binary Cliquenet (MBCliqueNet) aiming to improve the portability of CNNs based on binarized filters while achieving comparable performance with the full-precision CNNs like Resnet. In MBCliqueNet, we introduce a novel modulated operation to approximate the unbinarized filters and gives an initialization method to speed up its convergence. We reduce the extra parameters caused by modulated operation with parameters sharing. As a result, the proposed MBCliqueNet can reduce the required storage space of convolutional filters by a factor of at least 32, in contrast to the full-precision model, and achieve better performance than other state-of-the-art binarized models. More importantly, our model compares even better with some full-precision models like Resnet on the dataset we used.

CVNov 14, 2018
Robust low-rank multilinear tensor approximation for a joint estimation of the multilinear rank and the loading matrices

Xu Han, Laurent Albera, Amar Kachenoura et al.

In order to compute the best low-rank tensor approximation using the Multilinear Tensor Decomposition (MTD) model, it is essential to estimate the rank of the underlying multilinear tensor from the noisy observation tensor. In this paper, we propose a Robust MTD (R-MTD) method, which jointly estimates the multilinear rank and the loading matrices. Based on the low-rank property and an over-estimation of the core tensor, this joint estimation problem is solved by promoting (group) sparsity of the over-estimated core tensor. Group sparsity is promoted using mixed-norms. Then we establish a link between the mixed-norms and the nuclear norm, showing that mixed-norms are better candidates for a convex envelope of the rank. After several iterations of the Alternating Direction Method of Multipliers (ADMM), the Minimum Description Length (MDL) criterion computed from the eigenvalues of the unfolding matrices of the estimated core tensor is minimized in order to estimate the multilinear rank. The latter is then used to estimate more accurately the loading matrices. We further develop another R-MTD method, called R-OMTD, by imposing an orthonormality constraint on each loading matrix in order to decrease the computation complexity. A series of simulated noisy tensor and real-world data are used to show the effectiveness of the proposed methods compared with state-of-the-art methods.

CVJun 30, 2018
Fractional Wavelet Scattering Network and Applications

Li Liu, Jiasong Wu, Dengwang Li et al.

Objective: The present study introduces a fractional wavelet scattering network (FrScatNet), which is a generalized translation invariant version of the classical wavelet scattering network (ScatNet). Methods: In our approach, the FrScatNet is constructed based on the fractional wavelet transform (FRWT). The fractional scattering coefficients are iteratively computed using FRWTs and modulus operators. The feature vectors constructed by fractional scattering coefficients are usually used for signal classification. In this work, an application example of FrScatNet is provided in order to assess its performance on pathological images. Firstly, the FrScatNet extracts feature vectors from patches of the original histological images under different orders. Then we classify those patches into target (benign or malignant) and background groups. And the FrScatNet property is analyzed by comparing error rates computed from different fractional orders respectively. Based on the above pathological image classification, a gland segmentation algorithm is proposed by combining the boundary information and the gland location. Results: The error rates for different fractional orders of FrScatNet are examined and show that the classification accuracy is significantly improved in fractional scattering domain. We also compare the FrScatNet based gland segmentation method with those proposed in the 2015 MICCAI Gland Segmentation Challenge and our method achieves comparable results. Conclusion: The FrScatNet is shown to achieve accurate and robust results. More stable and discriminative fractional scattering coefficients are obtained by the FrScatNet in this work. Significance: The added fractional order parameter is able to analyze the image in the fractional scattering domain.

CVFeb 22, 2017
MomentsNet: a simple learning-free method for binary image recognition

Jiasong Wu, Shijie Qiu, Youyong Kong et al.

In this paper, we propose a new simple and learning-free deep learning network named MomentsNet, whose convolution layer, nonlinear processing layer and pooling layer are constructed by Moments kernels, binary hashing and block-wise histogram, respectively. Twelve typical moments (including geometrical moment, Zernike moment, Tchebichef moment, etc.) are used to construct the MomentsNet whose recognition performance for binary image is studied. The results reveal that MomentsNet has better recognition performance than its corresponding moments in almost all cases and ZernikeNet achieves the best recognition performance among MomentsNet constructed by twelve moments. ZernikeNet also shows better recognition performance on binary image database than that of PCANet, which is a learning-based deep learning network.

QMMay 12, 2016
Heart Rate Variability and Respiration Signal as Diagnostic Tools for Late Onset Sepsis in Neonatal Intensive Care Units

Yuan Wang, Guy Carrault, Alain Beuchee et al.

Apnea-bradycardia is one of the major clinical early indicators of late-onset sepsis occurring in approximately 7% to 10% of all neonates and in more than 25% of very low birth weight infants in NICU. The objective of this paper was to determine if HRV, respiration and their relationships help to diagnose infection in premature infants via non-invasive ways in NICU. Therefore, we implement Mono-Channel (MC) and Bi-Channel (BC) Analysis in two groups: sepsis (S) vs. non-sepsis (NS). Firstly, we studied RR series not only by linear methods: time domain and frequency domain, but also by non-linear methods: chaos theory and information theory. The results show that alpha Slow, alpha Fast and Sample Entropy are significant parameters to distinguish S from NS. Secondly, the question about the functional coupling of HRV and nasal respiration is addressed. Local linear correlation coefficient r2t,f has been explored, while non-linear regression coefficient h2 was calculated in two directions. It is obvious that r2t,f within the third frequency band (0.2<f<0.4 Hz) and h2 in two directions were complementary approaches to diagnose sepsis. Thirdly, feasibility study is carried out on the candidate parameters selected from MC and BC respectively. We discovered that the proposed test based on optimal fusion of 6 features shows good performance with the largest AUC and a reduced probability of false alarm (PFA).

CVMar 3, 2016
PCANet: An energy perspective

Jiasong Wu, Shijie Qiu, Youyong Kong et al.

The principal component analysis network (PCANet), which is one of the recently proposed deep learning architectures, achieves the state-of-the-art classification accuracy in various databases. However, the explanation of the PCANet is lacked. In this paper, we try to explain why PCANet works well from energy perspective point of view based on a set of experiments. The impact of various parameters on the error rate of PCANet is analyzed in depth. It was found that this error rate is correlated with the logarithm of energy of image. The proposed energy explanation approach can be used as a testing method for checking if every step of the constructed networks is necessary.

LGDec 20, 2015
Kernel principal component analysis network for image classification

Dan Wu, Jiasong Wu, Rui Zeng et al.

In order to classify the nonlinear feature with linear classifier and improve the classification accuracy, a deep learning network named kernel principal component analysis network (KPCANet) is proposed. First, mapping the data into higher space with kernel principal component analysis to make the data linearly separable. Then building a two-layer KPCANet to obtain the principal components of image. Finally, classifying the principal components with linearly classifier. Experimental results show that the proposed KPCANet is effective in face recognition, object recognition and hand-writing digits recognition, it also outperforms principal component analysis network (PCANet) generally as well. Besides, KPCANet is invariant to illumination and stable to occlusion and slight deformation.

CVMar 5, 2015
Color Image Classification via Quaternion Principal Component Analysis Network

Rui Zeng, Jiasong Wu, Zhuhong Shao et al.

The Principal Component Analysis Network (PCANet), which is one of the recently proposed deep learning architectures, achieves the state-of-the-art classification accuracy in various databases. However, the performance of PCANet may be degraded when dealing with color images. In this paper, a Quaternion Principal Component Analysis Network (QPCANet), which is an extension of PCANet, is proposed for color images classification. Compared to PCANet, the proposed QPCANet takes into account the spatial distribution information of color images and ensures larger amount of intra-class invariance of color images. Experiments conducted on different color image datasets such as Caltech-101, UC Merced Land Use, Georgia Tech face and CURet have revealed that the proposed QPCANet achieves higher classification accuracy than PCANet.

CVNov 5, 2014
Tensor object classification via multilinear discriminant analysis network

Rui Zeng, Jiasong Wu, Lotfi Senhadji et al.

This paper proposes a multilinear discriminant analysis network (MLDANet) for the recognition of multidimensional objects, known as tensor objects. The MLDANet is a variation of linear discriminant analysis network (LDANet) and principal component analysis network (PCANet), both of which are the recently proposed deep learning algorithms. The MLDANet consists of three parts: 1) The encoder learned by MLDA from tensor data. 2) Features maps ob-tained from decoder. 3) The use of binary hashing and histogram for feature pooling. A learning algorithm for MLDANet is described. Evaluations on UCF11 database indicate that the proposed MLDANet outperforms the PCANet, LDANet, MPCA + LDA, and MLDA in terms of classification for tensor objects.

CVNov 5, 2014
Multilinear Principal Component Analysis Network for Tensor Object Classification

Rui Zeng, Jiasong Wu, Zhuhong Shao et al.

The recently proposed principal component analysis network (PCANet) has been proved high performance for visual content classification. In this letter, we develop a tensorial extension of PCANet, namely, multilinear principal analysis component network (MPCANet), for tensor object classification. Compared to PCANet, the proposed MPCANet uses the spatial structure and the relationship between each dimension of tensor objects much more efficiently. Experiments were conducted on different visual content datasets including UCF sports action video sequences database and UCF11 database. The experimental results have revealed that the proposed MPCANet achieves higher classification accuracy than PCANet for tensor object classification.

CVJul 24, 2014
Performance evaluation of wavelet scattering network in image texture classification in various color spaces

Jiasong Wu, Longyu Jiang, Xu Han et al.

Texture plays an important role in many image analysis applications. In this paper, we give a performance evaluation of color texture classification by performing wavelet scattering network in various color spaces. Experimental results on the KTH_TIPS_COL database show that opponent RGB based wavelet scattering network outperforms other color spaces. Therefore, when dealing with the problem of color texture classification, opponent RGB based wavelet scattering network is recommended.

CVMar 12, 2014
Efficient Legendre moment computation for grey level images

Guanyu Yang, Huazhong Shu, Christine Toumoulin et al.

Legendre orthogonal moments have been widely used in the field of image analysis. Because their computation by a direct method is very time expensive, recent efforts have been devoted to the reduction of computational complexity. Nevertheless, the existing algorithms are mainly focused on binary images. We propose here a new fast method for computing the Legendre moments, which is not only suitable for binary images but also for grey levels. We first set up the recurrence formula of one-dimensional (1D) Legendre moments by using the recursive property of Legendre polynomials. As a result, the 1D Legendre moments of order p, Lp = Lp(0), can be expressed as a linear combination of Lp-1(1) and Lp-2(0). Based on this relationship, the 1D Legendre moments Lp(0) is thus obtained from the array of L1(a) and L0(a) where a is an integer number less than p. To further decrease the computation complexity, an algorithm, in which no multiplication is required, is used to compute these quantities. The method is then extended to the calculation of the two-dimensional Legendre moments Lpq. We show that the proposed method is more efficient than the direct method.

CVMar 12, 2014
Image reconstruction from limited range projections using orthogonal moments

Huazhong Shu, Jian Zhou, Guo-Niu Han et al.

A set of orthonormal polynomials is proposed for image reconstruction from projection data. The relationship between the projection moments and image moments is discussed in detail, and some interesting properties are demonstrated. Simulation results are provided to validate the method and to compare its performance with previous works.