CVMay 19Code
MSAVBench: Towards Comprehensive and Reliable Evaluation of Multi-Shot Audio-Video GenerationYujie Wei, Yujin Han, Zhekai Chen et al.
Video generation is rapidly evolving from single-shot synthesis to complex multi-shot audio-video (MSAV) narratives to meet real-world demands. However, evaluating such frontier models remains a fundamental challenge. Existing benchmarks are limited in scope and data diversity, and rely on rigid evaluation pipelines, preventing systematic and reliable assessment of modern MSAV models. To bridge these gaps, we introduce MSAVBench, the first comprehensive benchmark and adaptive hybrid evaluation framework for multi-shot audio-video generation. Our benchmark spans four key dimensions, video, audio, shot, and reference, covering diverse task settings, varying shot counts of up to 15, and challenging non-realistic scenarios. Our evaluation framework improves robustness through an adaptive self-correction mechanism for shot segmentation, instance-wise rubrics for subjective metrics, and tool-grounded evidence extraction for complex judgments. Furthermore, MSAVBench achieves high alignment with human judgments, reaching a Spearman rank correlation of 91.5%. Our systematic evaluation of 19 state-of-the-art closed- and open-source models shows that current systems still struggle with director-level control and fine-grained audio-visual synchronization, while modular or agentic generation pipelines offer a promising path toward narrowing the gap between open- and closed-source models. We will release the benchmark data and evaluation code to facilitate future research.
QUANT-PHMay 3
Quantum Monte Carlo algorithm for option pricing and its complexity analysisJianjun Chen, Yongming Li, Ariel Neufeld
In this paper we provide a quantum Monte Carlo algorithm to solve multidimensional Black-Scholes PDEs with correlation for option pricing. The payoff function of the option is of general form and is only required to be continuous and piecewise affine, which covers most of the relevant payoff functions used in finance. We provide a rigorous error analysis and complexity analysis of our algorithm. In particular, we prove that the computational complexity of our algorithm is bounded polynomially in the space dimension $d$ of the PDE and the reciprocal of the prescribed accuracy $\varepsilon$. Moreover, we show that for payoff functions which are bounded, our algorithm indeed has a speed-up compared to classical Monte Carlo methods. Furthermore, we provide numerical simulations in two dimensions using our developed package within the Qiskit framework tailored to price continuous piecewise affine options with respect to the Black-Scholes model, as well as discuss the potential extension of the numerical simulations to arbitrary space dimension.
CVApr 16, 2024Code
The Ninth NTIRE 2024 Efficient Super-Resolution Challenge ReportBin Ren, Yawei Li, Nancy Mehta et al.
This paper provides a comprehensive review of the NTIRE 2024 challenge, focusing on efficient single-image super-resolution (ESR) solutions and their outcomes. The task of this challenge is to super-resolve an input image with a magnification factor of x4 based on pairs of low and corresponding high-resolution images. The primary objective is to develop networks that optimize various aspects such as runtime, parameters, and FLOPs, while still maintaining a peak signal-to-noise ratio (PSNR) of approximately 26.90 dB on the DIV2K_LSDIR_valid dataset and 26.99 dB on the DIV2K_LSDIR_test dataset. In addition, this challenge has 4 tracks including the main track (overall performance), sub-track 1 (runtime), sub-track 2 (FLOPs), and sub-track 3 (parameters). In the main track, all three metrics (ie runtime, FLOPs, and parameter count) were considered. The ranking of the main track is calculated based on a weighted sum-up of the scores of all other sub-tracks. In sub-track 1, the practical runtime performance of the submissions was evaluated, and the corresponding score was used to determine the ranking. In sub-track 2, the number of FLOPs was considered. The score calculated based on the corresponding FLOPs was used to determine the ranking. In sub-track 3, the number of parameters was considered. The score calculated based on the corresponding parameters was used to determine the ranking. RLFN is set as the baseline for efficiency measurement. The challenge had 262 registered participants, and 34 teams made valid submissions. They gauge the state-of-the-art in efficient single-image super-resolution. To facilitate the reproducibility of the challenge and enable other researchers to build upon these findings, the code and the pre-trained model of validated solutions are made publicly available at https://github.com/Amazingren/NTIRE2024_ESR/.
LGJun 25, 2022
Envelope imbalanced ensemble model with deep sample learning and local-global structure consistencyFan Li, Xiaoheng Zhang, Yongming Li et al.
The class imbalance problem is important and challenging. Ensemble approaches are widely used to tackle this problem because of their effectiveness. However, existing ensemble methods are always applied into original samples, while not considering the structure information among original samples. The limitation will prevent the imbalanced learning from being better. Besides, research shows that the structure information among samples includes local and global structure information. Based on the analysis above, an imbalanced ensemble algorithm with the deep sample pre-envelope network (DSEN) and local-global structure consistency mechanism (LGSCM) is proposed here to solve the problem.This algorithm can guarantee high-quality deep envelope samples for considering the local manifold and global structures information, which is helpful for imbalance learning. First, the deep sample envelope pre-network (DSEN) is designed to mine structure information among samples.Then, the local manifold structure metric (LMSM) and global structure distribution metric (GSDM) are designed to construct LGSCM to enhance distribution consistency of interlayer samples. Next, the DSEN and LGSCM are put together to form the final deep sample envelope network (DSEN-LG). After that, base classifiers are applied on the layers of deep samples respectively.Finally, the predictive results from base classifiers are fused through bagging ensemble learning mechanism. To demonstrate the effectiveness of the proposed method, forty-four public datasets and more than ten representative relevant algorithms are chosen for verification. The experimental results show that the algorithm is significantly better than other imbalanced ensemble algorithms.
HCOct 23, 2023
Synergizing Human-AI Agency: A Guide of 23 Heuristics for Service Co-Creation with LLM-Based AgentsQingxiao Zheng, Zhongwei Xu, Abhinav Choudhry et al.
This empirical study serves as a primer for interested service providers to determine if and how Large Language Models (LLMs) technology will be integrated for their practitioners and the broader community. We investigate the mutual learning journey of non-AI experts and AI through CoAGent, a service co-creation tool with LLM-based agents. Engaging in a three-stage participatory design processes, we work with with 23 domain experts from public libraries across the U.S., uncovering their fundamental challenges of integrating AI into human workflows. Our findings provide 23 actionable "heuristics for service co-creation with AI", highlighting the nuanced shared responsibilities between humans and AI. We further exemplar 9 foundational agency aspects for AI, emphasizing essentials like ownership, fair treatment, and freedom of expression. Our innovative approach enriches the participatory design model by incorporating AI as crucial stakeholders and utilizing AI-AI interaction to identify blind spots. Collectively, these insights pave the way for synergistic and ethical human-AI co-creation in service contexts, preparing for workforce ecosystems where AI coexists.
LGNov 30, 2022
Overlapping oriented imbalanced ensemble learning method based on projective clustering and stagewise hybrid samplingFan Li, Bo Wang, Pin Wang et al.
The challenge of imbalanced learning lies not only in class imbalance problem, but also in the class overlapping problem which is complex. However, most of the existing algorithms mainly focus on the former. The limitation prevents the existing methods from breaking through. To address this limitation, this paper proposes an ensemble learning algorithm based on dual clustering and stage-wise hybrid sampling (DCSHS). The DCSHS has three parts. Firstly, we design a projection clustering combination framework (PCC) guided by Davies-Bouldin clustering effectiveness index (DBI), which is used to obtain high-quality clusters and combine them to obtain a set of cross-complete subsets (CCS) with balanced class and low overlapping. Secondly, according to the characteristics of subset classes, a stage-wise hybrid sampling algorithm is designed to realize the de-overlapping and balancing of subsets. Finally, a projective clustering transfer mapping mechanism (CTM) is constructed for all processed subsets by means of transfer learning, thereby reducing class overlapping and explore structure information of samples. The major advantage of our algorithm is that it can exploit the intersectionality of the CCS to realize the soft elimination of overlapping majority samples, and learn as much information of overlapping samples as possible, thereby enhancing the class overlapping while class balancing. In the experimental section, more than 30 public datasets and over ten representative algorithms are chosen for verification. The experimental results show that the DCSHS is significantly best in terms of various evaluation criteria.
LGOct 25, 2022
A new Stack Autoencoder: Neighbouring Sample Envelope Embedded Stack Autoencoder Ensemble ModelChuanyan Zhou, Jie Ma, Fan Li et al.
Stack autoencoder (SAE), as a representative deep network, has unique and excellent performance in feature learning, and has received extensive attention from researchers. However, existing deep SAEs focus on original samples without considering the hierarchical structural information between samples. To address this limitation, this paper proposes a new SAE model-neighbouring envelope embedded stack autoencoder ensemble (NE_ESAE). Firstly, the neighbouring sample envelope learning mechanism (NSELM) is proposed for preprocessing of input of SAE. NSELM constructs sample pairs by combining neighbouring samples. Besides, the NSELM constructs a multilayer sample spaces by multilayer iterative mean clustering, which considers the similar samples and generates layers of envelope samples with hierarchical structural information. Second, an embedded stack autoencoder (ESAE) is proposed and trained in each layer of sample space to consider the original samples during training and in the network structure, thereby better finding the relationship between original feature samples and deep feature samples. Third, feature reduction and base classifiers are conducted on the layers of envelope samples respectively, and output classification results of every layer of samples. Finally, the classification results of the layers of envelope sample space are fused through the ensemble mechanism. In the experimental section, the proposed algorithm is validated with over ten representative public datasets. The results show that our method significantly has better performance than existing traditional feature learning methods and the representative deep autoencoders.
ASAug 13, 2024
VNet: A GAN-based Multi-Tier Discriminator Network for Speech Synthesis VocodersYubing Cao, Yongming Li, Liejun Wang et al.
Since the introduction of Generative Adversarial Networks (GANs) in speech synthesis, remarkable achievements have been attained. In a thorough exploration of vocoders, it has been discovered that audio waveforms can be generated at speeds exceeding real-time while maintaining high fidelity, achieved through the utilization of GAN-based models. Typically, the inputs to the vocoder consist of band-limited spectral information, which inevitably sacrifices high-frequency details. To address this, we adopt the full-band Mel spectrogram information as input, aiming to provide the vocoder with the most comprehensive information possible. However, previous studies have revealed that the use of full-band spectral information as input can result in the issue of over-smoothing, compromising the naturalness of the synthesized speech. To tackle this challenge, we propose VNet, a GAN-based neural vocoder network that incorporates full-band spectral information and introduces a Multi-Tier Discriminator (MTD) comprising multiple sub-discriminators to generate high-resolution signals. Additionally, we introduce an asymptotically constrained method that modifies the adversarial loss of the generator and discriminator, enhancing the stability of the training process. Through rigorous experiments, we demonstrate that the VNet model is capable of generating high-fidelity speech and significantly improving the performance of the vocoder.
CVFeb 28, 2025Code
BadRefSR: Backdoor Attacks Against Reference-based Image Super ResolutionXue Yang, Tao Chen, Lei Guo et al.
Reference-based image super-resolution (RefSR) represents a promising advancement in super-resolution (SR). In contrast to single-image super-resolution (SISR), RefSR leverages an additional reference image to help recover high-frequency details, yet its vulnerability to backdoor attacks has not been explored. To fill this research gap, we propose a novel attack framework called BadRefSR, which embeds backdoors in the RefSR model by adding triggers to the reference images and training with a mixed loss function. Extensive experiments across various backdoor attack settings demonstrate the effectiveness of BadRefSR. The compromised RefSR network performs normally on clean input images, while outputting attacker-specified target images on triggered input images. Our study aims to alert researchers to the potential backdoor risks in RefSR. Codes are available at https://github.com/xuefusiji/BadRefSR.
LGMar 15, 2023
Health Monitoring of Movement Disorder Subject based on Diamond Stacked Sparse Autoencoder Ensemble ModelLikun Tang, Jie Ma, Yongming Li
The health monitoring of chronic diseases is very important for people with movement disorders because of their limited mobility and long duration of chronic diseases. Machine learning-based processing of data collected from the human with movement disorders using wearable sensors is an effective method currently available for health monitoring. However, wearable sensor systems are difficult to obtain high-quality and large amounts of data, which cannot meet the requirement for diagnostic accuracy. Moreover, existing machine learning methods do not handle this problem well. Feature learning is key to machine learning. To solve this problem, a health monitoring of movement disorder subject based on diamond stacked sparse autoencoder ensemble model (DsaeEM) is proposed in this paper. This algorithm has two major components. First, feature expansion is designed using feature-embedded stacked sparse autoencoder (FSSAE). Second, a feature reduction mechanism is designed to remove the redundancy among the expanded features. This mechanism includes L1 regularized feature-reduction algorithm and the improved manifold dimensionality reduction algorithm. This paper refers to the combined feature expansion and feature reduction mechanism as the diamond-like feature learning mechanism. The method is experimentally verified with several state of art algorithms and on two datasets. The results show that the proposed algorithm has higher accuracy apparently. In conclusion, this study developed an effective and feasible feature-learning algorithm for the recognition of chronic diseases.
CVMay 4, 2025
Drug classification based on X-ray spectroscopy combined with machine learningYongming Li, Peng Wang, Bangdong Han
The proliferation of new types of drugs necessitates the urgent development of faster and more accurate detection methods. Traditional detection methods have high requirements for instruments and environments, making the operation complex. X-ray absorption spectroscopy, a non-destructive detection technique, offers advantages such as ease of operation, penetrative observation, and strong substance differentiation capabilities, making it well-suited for application in the field of drug detection and identification. In this study, we constructed a classification model using Convolutional Neural Networks (CNN), Support Vector Machines (SVM), and Particle Swarm Optimization (PSO) to classify and identify drugs based on their X-ray spectral profiles. In the experiments, we selected 14 chemical reagents with chemical formulas similar to drugs as samples. We utilized CNN to extract features from the spectral data of these 14 chemical reagents and used the extracted features to train an SVM model. We also utilized PSO to optimize two critical initial parameters of the SVM. The experimental results demonstrate that this model achieved higher classification accuracy compared to two other common methods, with a prediction accuracy of 99.14%. Additionally, the model exhibited fast execution speed, mitigating the drawback of a drastic increase in running time and efficiency reduction that may result from the direct fusion of PSO and SVM. Therefore, the combined approach of X-ray absorption spectroscopy with CNN, PSO, and SVM provides a rapid, highly accurate, and reliable classification and identification method for the field of drug detection, holding promising prospects for widespread application.
SDApr 12, 2025
AMNet: An Acoustic Model Network for Enhanced Mandarin Speech SynthesisYubing Cao, Yinfeng Yu, Yongming Li et al.
This paper presents AMNet, an Acoustic Model Network designed to improve the performance of Mandarin speech synthesis by incorporating phrase structure annotation and local convolution modules. AMNet builds upon the FastSpeech 2 architecture while addressing the challenge of local context modeling, which is crucial for capturing intricate speech features such as pauses, stress, and intonation. By embedding a phrase structure parser into the model and introducing a local convolution module, AMNet enhances the model's sensitivity to local information. Additionally, AMNet decouples tonal characteristics from phonemes, providing explicit guidance for tone modeling, which improves tone accuracy and pronunciation. Experimental results demonstrate that AMNet outperforms baseline models in subjective and objective evaluations. The proposed model achieves superior Mean Opinion Scores (MOS), lower Mel Cepstral Distortion (MCD), and improved fundamental frequency fitting $F0 (R^2)$, confirming its ability to generate high-quality, natural, and expressive Mandarin speech.
SDNov 17, 2021
Subject Enveloped Deep Sample Fuzzy Ensemble Learning Algorithm of Parkinson's Speech DataYiwen Wang, Fan Li, Xiaoheng Zhang et al.
Parkinson disease (PD)'s speech recognition is an effective way for its diagnosis, which has become a hot and difficult research area in recent years. As we know, there are large corpuses (segments) within one subject. However, too large segments will increase the complexity of the classification model. Besides, the clinicians interested in finding diagnostic speech markers that reflect the pathology of the whole subject. Since the optimal relevant features of each speech sample segment are different, it is difficult to find the uniform diagnostic speech markers. Therefore, it is necessary to reconstruct the existing large segments within one subject into few segments even one segment within one subject, which can facilitate the extraction of relevant speech features to characterize diagnostic markers for the whole subject. To address this problem, an enveloped deep speech sample learning algorithm for Parkinson's subjects based on multilayer fuzzy c-mean (MlFCM) clustering and interlayer consistency preservation is proposed in this paper. The algorithm can be used to achieve intra-subject sample reconstruction for Parkinson's disease (PD) to obtain a small number of high-quality prototype sample segments. At the end of the paper, several representative PD speech datasets are selected and compared with the state-of-the-art related methods, respectively. The experimental results show that the proposed algorithm is effective signifcantly.
AINov 2, 2021
Envelope Imbalance Learning Algorithm based on Multilayer Fuzzy C-means Clustering and Minimum Interlayer discrepancyFan Li, Xiaoheng Zhang, Pin Wang et al.
Imbalanced learning is important and challenging since the problem of the classification of imbalanced datasets is prevalent in machine learning and data mining fields. Sampling approaches are proposed to address this issue, and cluster-based oversampling methods have shown great potential as they aim to simultaneously tackle between-class and within-class imbalance issues. However, all existing clustering methods are based on a one-time approach. Due to the lack of a priori knowledge, improper setting of the number of clusters often exists, which leads to poor clustering performance. Besides, the existing methods are likely to generate noisy instances. To solve these problems, this paper proposes a deep instance envelope network-based imbalanced learning algorithm with the multilayer fuzzy c-means (MlFCM) and a minimum interlayer discrepancy mechanism based on the maximum mean discrepancy (MIDMD). This algorithm can guarantee high quality balanced instances using a deep instance envelope network in the absence of prior knowledge. In the experimental section, thirty-three popular public datasets are used for verification, and over ten representative algorithms are used for comparison. The experimental results show that the proposed approach significantly outperforms other popular methods.
SDAug 23, 2021
Subject Envelope based Multitype Reconstruction Algorithm of Speech Samples of Parkinson's DiseaseYongming Li, Chengyu Liu, Pin Wang et al.
The risk of Parkinson's disease (PD) is extremely serious, and PD speech recognition is an effective method of diagnosis nowadays. However, due to the influence of the disease stage, corpus, and other factors on data collection, the ability of every samples within one subject to reflect the status of PD vary. No samples are useless totally, and not samples are 100% perfect. This characteristic means that it is not suitable just to remove some samples or keep some samples. It is necessary to consider the sample transformation for obtaining high quality new samples. Unfortunately, existing PD speech recognition methods focus mainly on feature learning and classifier design rather than sample learning, and few methods consider the sample transformation. To solve the problem above, a PD speech sample transformation algorithm based on multitype reconstruction operators is proposed in this paper. The algorithm is divided into four major steps. Three types of reconstruction operators are designed in the algorithm: types A, B and C. Concerning the type A operator, the original dataset is directly reconstructed by designing a linear transformation to obtain the first dataset. The type B operator is designed for clustering and linear transformation of the dataset to obtain the second new dataset. The third operator, namely, the type C operator, reconstructs the dataset by clustering and convolution to obtain the third dataset. Finally, the base classifier is trained based on the three new datasets, and then the classification results are fused by decision weighting. In the experimental section, two representative PD speech datasets are used for verification. The results show that the proposed algorithm is effective. Compared with other algorithms, the proposed algorithm achieves apparent improvements in terms of classification accuracy.
AIAug 15, 2021
A Fast Algorithm for Computing the Deficiency Number of a Mahjong HandXueqing Yan, Yongming Li, Sanjiang Li
The tile-based multiplayer game Mahjong is widely played in Asia and has also become increasingly popular worldwide. Face-to-face or online, each player begins with a hand of 13 tiles and players draw and discard tiles in turn until they complete a winning hand. An important notion in Mahjong is the deficiency number (a.k.a. shanten number in Japanese Mahjong) of a hand, which estimates how many tile changes are necessary to complete the hand into a winning hand. The deficiency number plays an essential role in major decision-making tasks such as selecting a tile to discard. This paper proposes a fast algorithm for computing the deficiency number of a Mahjong hand. Compared with the baseline algorithm, the new algorithm is usually 100 times faster and, more importantly, respects the agent's knowledge about available tiles. The algorithm can be used as a basic procedure in all Mahjong variants by both rule-based and machine learning-based Mahjong AI.
IVAug 9, 2021
FA-GAN: Fused Attentive Generative Adversarial Networks for MRI Image Super-ResolutionMingfeng Jiang, Minghao Zhi, Liying Wei et al.
High-resolution magnetic resonance images can provide fine-grained anatomical information, but acquiring such data requires a long scanning time. In this paper, a framework called the Fused Attentive Generative Adversarial Networks(FA-GAN) is proposed to generate the super-resolution MR image from low-resolution magnetic resonance images, which can reduce the scanning time effectively but with high resolution MR images. In the framework of the FA-GAN, the local fusion feature block, consisting of different three-pass networks by using different convolution kernels, is proposed to extract image features at different scales. And the global feature fusion module, including the channel attention module, the self-attention module, and the fusion operation, is designed to enhance the important features of the MR image. Moreover, the spectral normalization process is introduced to make the discriminator network stable. 40 sets of 3D magnetic resonance images (each set of images contains 256 slices) are used to train the network, and 10 sets of images are used to test the proposed method. The experimental results show that the PSNR and SSIM values of the super-resolution magnetic resonance image generated by the proposed FA-GAN method are higher than the state-of-the-art reconstruction methods.
CVMar 11, 2021
Integrated Age Estimation MechanismFan Li, Yongming Li, Pin Wang et al.
Machine-learning-based age estimation has received lots of attention. Traditional age estimation mechanism focuses estimation age error, but ignores that there is a deviation between the estimated age and real age due to disease. Pathological age estimation mechanism the author proposed before introduces age deviation to solve the above problem and improves classification capability of the estimated age significantly. However,it does not consider the age estimation error of the normal control (NC) group and results in a larger error between the estimated age and real age of NC group. Therefore, an integrated age estimation mechanism based on Decision-Level fusion of error and deviation orientation model is proposed to solve the problem.Firstly, the traditional age estimation and pathological age estimation mechanisms are weighted together.Secondly, their optimal weights are obtained by minimizing mean absolute error (MAE) between the estimated age and real age of normal people. In the experimental section, several representative age-related datasets are used for verification of the proposed method. The results show that the proposed age estimation mechanism achieves a good tradeoff effect of age estimation. It not only improves the classification ability of the estimated age, but also reduces the age estimation error of the NC group. In general, the proposed age estimation mechanism is effective. Additionally, the mechanism is a framework mechanism that can be used to construct different specific age estimation algorithms, contributing to relevant research.
CVJun 20, 2020
Deep Double-Side Learning Ensemble Model for Few-Shot Parkinson Speech RecognitionYongming Li, Lang Zhou, Lingyun Qin et al.
Diagnosis and therapeutic effect assessment of Parkinson disease based on voice data are very important,but its few-shot learning problem is challenging.Although deep learning is good at automatic feature extraction, it suffers from few-shot learning problem. Therefore, the general effective method is first conduct feature extraction based on prior knowledge, and then carry out feature reduction for subsequent classification. However, there are two major problems: 1) Structural information among speech features has not been mined and new features of higher quality have not been reconstructed. 2) Structural information between data samples has not been mined and new samples with higher quality have not been reconstructed. To solve these two problems, based on the existing Parkinson speech feature data set, a deep double-side learning ensemble model is designed in this paper that can reconstruct speech features and samples deeply and simultaneously. As to feature reconstruction, an embedded deep stacked group sparse auto-encoder is designed in this paper to conduct nonlinear feature transformation, so as to acquire new high-level deep features, and then the deep features are fused with original speech features by L1 regularization feature selection method. As to speech sample reconstruction, a deep sample learning algorithm is designed in this paper based on iterative mean clustering to conduct samples transformation, so as to obtain new high-level deep samples. Finally, the bagging ensemble learning mode is adopted to fuse the deep feature learning algorithm and the deep samples learning algorithm together, thereby constructing a deep double-side learning ensemble model. At the end of this paper, two representative speech datasets of Parkinson's disease were used for verification. The experimental results show that the proposed algorithm are effective.
LGFeb 17, 2020
Hybrid Embedded Deep Stacked Sparse Autoencoder with w_LPPD SVM EnsembleYongming Li, Yan Lei, Pin Wang et al.
Deep learning is a kind of feature learning method with strong nonliear feature transformation and becomes more and more important in many fields of artificial intelligence. Deep autoencoder is one representative method of the deep learning methods, and can effectively extract abstract the information of datasets. However, it does not consider the complementarity between the deep features and original features during deep feature transformation. Besides, it suffers from small sample problem. In order to solve these problems, a novel deep autoencoder - hybrid feature embedded stacked sparse autoencoder(HESSAE) has been proposed in this paper. HFESAE is capable to learn discriminant deep features with the help of embedding original features to filter weak hidden-layer outputs during training. For the issue that class representation ability of abstract information is limited by small sample problem, a feature fusion strategy has been designed aiming to combining abstract information learned by HFESAE with original feature and obtain hybrid features for feature reduction. The strategy is hybrid feature selection strategy based on L1 regularization followed by an support vector machine(SVM) ensemble model, in which weighted local discriminant preservation projection (w_LPPD), is designed and employed on each base classifier. At the end of this paper, several representative public datasets are used to verify the effectiveness of the proposed algorithm. The experimental results demonstrated that, the proposed feature learning method yields superior performance compared to other existing and state of art feature learning algorithms including some representative deep autoencoder methods.
LGFeb 10, 2020
Classification Algorithm of Speech Data of Parkinsons Disease Based on Convolution Sparse Kernel Transfer Learning with Optimal Kernel and Parallel Sample Feature SelectionXiaoheng Zhang, Yongming Li, Pin Wang et al.
Labeled speech data from patients with Parkinsons disease (PD) are scarce, and the statistical distributions of training and test data differ significantly in the existing datasets. To solve these problems, dimensional reduction and sample augmentation must be considered. In this paper, a novel PD classification algorithm based on sparse kernel transfer learning combined with a parallel optimization of samples and features is proposed. Sparse transfer learning is used to extract effective structural information of PD speech features from public datasets as source domain data, and the fast ADDM iteration is improved to enhance the information extraction performance. To implement the parallel optimization, the potential relationships between samples and features are considered to obtain high-quality combined features. First, features are extracted from a specific public speech dataset to construct a feature dataset as the source domain. Then, the PD target domain, including the training and test datasets, is encoded by convolution sparse coding, which can extract more in-depth information. Next, parallel optimization is implemented. To further improve the classification performance, a convolution kernel optimization mechanism is designed. Using two representative public datasets and one self-constructed dataset, the experiments compare over thirty relevant algorithms. The results show that when taking the Sakar dataset, MaxLittle dataset and DNSH dataset as target domains, the proposed algorithm achieves obvious improvements in classification accuracy. The study also found large improvements in the algorithms in this paper compared with nontransfer learning approaches, demonstrating that transfer learning is both more effective and has a more acceptable time cost.
CVJan 17, 2020
Two-Phase Object-Based Deep Learning for Multi-temporal SAR Image Change DetectionXinzheng Zhang, Guo Liu, Ce Zhang et al.
Change detection is one of the fundamental applications of synthetic aperture radar (SAR) images. However, speckle noise presented in SAR images has a much negative effect on change detection. In this research, a novel two-phase object-based deep learning approach is proposed for multi-temporal SAR image change detection. Compared with traditional methods, the proposed approach brings two main innovations. One is to classify all pixels into three categories rather than two categories: unchanged pixels, changed pixels caused by strong speckle (false changes), and changed pixels formed by real terrain variation (real changes). The other is to group neighboring pixels into segmented into superpixel objects (from pixels) such as to exploit local spatial context. Two phases are designed in the methodology: 1) Generate objects based on the simple linear iterative clustering algorithm, and discriminate these objects into changed and unchanged classes using fuzzy c-means (FCM) clustering and a deep PCANet. The prediction of this Phase is the set of changed and unchanged superpixels. 2) Deep learning on the pixel sets over the changed superpixels only, obtained in the first phase, to discriminate real changes from false changes. SLIC is employed again to achieve new superpixels in the second phase. Low rank and sparse decomposition are applied to these new superpixels to suppress speckle noise significantly. A further clustering step is applied to these new superpixels via FCM. A new PCANet is then trained to classify two kinds of changed superpixels to achieve the final change maps. Numerical experiments demonstrate that, compared with benchmark methods, the proposed approach can distinguish real changes from false changes effectively with significantly reduced false alarm rates, and achieve up to 99.71% change detection accuracy using multi-temporal SAR imagery.
AIJan 2, 2012
Continuity in Information AlgebrasXuechong Guan, Yongming Li
In this paper, the continuity and strong continuity in domain-free information algebras and labeled information algebras are introduced respectively. A more general concept of continuous function which is defined between two domain-free continuous information algebras is presented. It is shown that, with the operations combination and focusing, the set of all continuous functions between two domain-free s-continuous information algebras forms a new s-continuous information algebra. By studying the relationship between domain-free information algebras and labeled information algebras, it is demonstrated that they do correspond to each other on s-compactness.