CVMar 15, 2022
GCT: Graph Co-Training for Semi-Supervised Few-Shot LearningRui Xu, Lei Xing, Shuai Shao et al.
Few-shot learning (FSL), purposing to resolve the problem of data-scarce, has attracted considerable attention in recent years. A popular FSL framework contains two phases: (i) the pre-train phase employs the base data to train a CNN-based feature extractor. (ii) the meta-test phase applies the frozen feature extractor to novel data (novel data has different categories from base data) and designs a classifier for recognition. To correct few-shot data distribution, researchers propose Semi-Supervised Few-Shot Learning (SSFSL) by introducing unlabeled data. Although SSFSL has been proved to achieve outstanding performances in the FSL community, there still exists a fundamental problem: the pre-trained feature extractor can not adapt to the novel data flawlessly due to the cross-category setting. Usually, large amounts of noises are introduced to the novel feature. We dub it as Feature-Extractor-Maladaptive (FEM) problem. To tackle FEM, we make two efforts in this paper. First, we propose a novel label prediction method, Isolated Graph Learning (IGL). IGL introduces the Laplacian operator to encode the raw data to graph space, which helps reduce the dependence on features when classifying, and then project graph representation to label space for prediction. The key point is that: IGL can weaken the negative influence of noise from the feature representation perspective, and is also flexible to independently complete training and testing procedures, which is suitable for SSFSL. Second, we propose Graph Co-Training (GCT) to tackle this challenge from a multi-modal fusion perspective by extending the proposed IGL to the co-training framework. GCT is a semi-supervised method that exploits the unlabeled samples with two modal features to crossly strengthen the IGL classifier.
LGOct 31, 2022
SEVGGNet-LSTM: a fused deep learning model for ECG classificationTongyue He, Yiming Chen, Junxin Chen et al.
This paper presents a fused deep learning algorithm for ECG classification. It takes advantages of the combined convolutional and recurrent neural network for ECG classification, and the weight allocation capability of attention mechanism. The input ECG signals are firstly segmented and normalized, and then fed into the combined VGG and LSTM network for feature extraction and classification. An attention mechanism (SE block) is embedded into the core network for increasing the weight of important features. Two databases from different sources and devices are employed for performance validation, and the results well demonstrate the effectiveness and robustness of the proposed algorithm.
IVSep 5, 2022
Uformer-ICS: A U-Shaped Transformer for Image Compressive Sensing ServiceKuiyuan Zhang, Zhongyun Hua, Yuanman Li et al.
Many service computing applications require real-time dataset collection from multiple devices, necessitating efficient sampling techniques to reduce bandwidth and storage pressure. Compressive sensing (CS) has found wide-ranging applications in image acquisition and reconstruction. Recently, numerous deep-learning methods have been introduced for CS tasks. However, the accurate reconstruction of images from measurements remains a significant challenge, especially at low sampling rates. In this paper, we propose Uformer-ICS as a novel U-shaped transformer for image CS tasks by introducing inner characteristics of CS into transformer architecture. To utilize the uneven sparsity distribution of image blocks, we design an adaptive sampling architecture that allocates measurement resources based on the estimated block sparsity, allowing the compressed results to retain maximum information from the original image. Additionally, we introduce a multi-channel projection (MCP) module inspired by traditional CS optimization methods. By integrating the MCP module into the transformer blocks, we construct projection-based transformer blocks, and then form a symmetrical reconstruction model using these blocks and residual convolutional blocks. Therefore, our reconstruction model can simultaneously utilize the local features and long-range dependencies of image, and the prior projection knowledge of CS theory. Experimental results demonstrate its significantly better reconstruction performance than state-of-the-art deep learning-based CS methods.
SYMay 10Code
PolarNet: Single-Minima Neural Network for Modeling Lyapunov FunctionsYuan Zhong, Jiaxin Cheng, Hefu Ye et al.
Learning control strategies with provable stability guarantees continues to be a challenging problem. In this work, we examine a family of training-time behaviors exhibited by existing neural Lyapunov control methods under specific conditions, which can hinder the synthesis of a provably stable controller. We identify the root cause as the lack of neural network architectural guarantees on the learned Lyapunov function, and propose PolarNet, a network architecture that provably addresses these issues by structurally guarantee to have a single critical point. We provide theoretical guarantee regarding the properness and universality of PolarNet for modeling Lyapunov functions, and show that using it as a drop-in replacement in existing neural Lyapunov control methods can effectively circumvent particular difficulties in training. We conduct a set of numerical experiments to verify that PolarNet consistently maintains a single critical point and, when used as a drop-in replacement in existing neural Lyapunov control methods, successfully avoids training failures caused by the lack of architectural guarantees. The code of this paper is available at https://github.com/23-zy/PolarNet.
CVSep 7, 2024
Rethinking The Training And Evaluation of Rich-Context Layout-to-Image GenerationJiaxin Cheng, Zixu Zhao, Tong He et al.
Recent advancements in generative models have significantly enhanced their capacity for image generation, enabling a wide range of applications such as image editing, completion and video editing. A specialized area within generative modeling is layout-to-image (L2I) generation, where predefined layouts of objects guide the generative process. In this study, we introduce a novel regional cross-attention module tailored to enrich layout-to-image generation. This module notably improves the representation of layout regions, particularly in scenarios where existing methods struggle with highly complex and detailed textual descriptions. Moreover, while current open-vocabulary L2I methods are trained in an open-set setting, their evaluations often occur in closed-set environments. To bridge this gap, we propose two metrics to assess L2I performance in open-vocabulary scenarios. Additionally, we conduct a comprehensive user study to validate the consistency of these metrics with human preferences.
CVOct 11, 2023
Multiview Transformer: Rethinking Spatial Information in Hyperspectral Image ClassificationJie Zhang, Yongshan Zhang, Yicong Zhou
Identifying the land cover category for each pixel in a hyperspectral image (HSI) relies on spectral and spatial information. An HSI cuboid with a specific patch size is utilized to extract spatial-spectral feature representation for the central pixel. In this article, we investigate that scene-specific but not essential correlations may be recorded in an HSI cuboid. This additional information improves the model performance on existing HSI datasets and makes it hard to properly evaluate the ability of a model. We refer to this problem as the spatial overfitting issue and utilize strict experimental settings to avoid it. We further propose a multiview transformer for HSI classification, which consists of multiview principal component analysis (MPCA), spectral encoder-decoder (SED), and spatial-pooling tokenization transformer (SPTT). MPCA performs dimension reduction on an HSI via constructing spectral multiview observations and applying PCA on each view data to extract low-dimensional view representation. The combination of view representations, named multiview representation, is the dimension reduction output of the MPCA. To aggregate the multiview information, a fully-convolutional SED with a U-shape in spectral dimension is introduced to extract a multiview feature map. SPTT transforms the multiview features into tokens using the spatial-pooling tokenization strategy and learns robust and discriminative spatial-spectral features for land cover identification. Classification is conducted with a linear classifier. Experiments on three HSI datasets with rigid settings demonstrate the superiority of the proposed multiview transformer over the state-of-the-art methods.
CVApr 20
Voronoi-guided Bilateral 2D Gaussian Splatting for Arbitrary-Scale Hyperspectral Image Super-ResolutionJie Zhang, Jinkun You, Shi Chen et al.
Most existing hyperspectral image super-resolution methods require modifications for different scales, limiting their flexibility in arbitrary-scale reconstruction. 2D Gaussian splatting provides a continuous representation that is compatible with arbitrary-scale super-resolution. Existing methods often rely on rasterization strategies, which may limit flexible spatial modeling. Extending them to hyperspectral image super-resolution remains challenging, as the task requires adaptive spatial reconstruction while preserving spectral fidelity. This paper proposes GaussianHSI, a Gaussian-Splatting-based framework for arbitrary-scale hyperspectral image super-resolution. We develop a Voronoi-Guided Bilateral 2D Gaussian Splatting for spatial reconstruction. After predicting a set of Gaussian functions to represent the input, it associates each target pixel with relevant Gaussian functions through Voronoi-guided selection. The target pixel is then reconstructed by aggregating the selected Gaussian functions with reference-aware bilateral weighting, which considers both geometric relevance and consistency with low-resolution features. We further introduce a Spectral Detail Enhancement module to improve spectral reconstruction. Extensive experiments on benchmark datasets demonstrate the effectiveness of GaussianHSI over state-of-the-art methods for arbitrary-scale hyperspectral image super-resolution.
CVMar 21
MEMO: Human-like Crisp Edge Detection Using Masked Edge PredictionJiaxin Cheng, Yue Wu, Yicong Zhou
Learning-based edge detection models trained with cross-entropy loss often suffer from thick edge predictions, which deviate from the crisp, single-pixel annotations typically provided by humans. While previous approaches to achieving crisp edges have focused on designing specialized loss functions or modifying network architectures, we show that a carefully designed training and inference strategy alone is sufficient to achieve human-like edge quality. In this work, we introduce the Masked Edge Prediction MOdel (MEMO), which produces both accurate and crisp edges using only cross-entropy loss. We first construct a large-scale synthetic edge dataset to pre-train MEMO, enhancing its generalization ability. Subsequent fine-tuning on downstream datasets requires only a lightweight module comprising 1.2\% additional parameters. During training, MEMO learns to predict edges under varying ratios of input masking. A key insight guiding our inference is that thick edge predictions typically exhibit a confidence gradient: high in the center and lower toward the boundaries. Leveraging this, we propose a novel progressive prediction strategy that sequentially finalizes edge predictions in order of prediction confidence, resulting in thinner and more precise contours. Our method achieves visually appealing, post-processing-free, human-like edge maps and outperforms prior methods on crispness-aware evaluations.
CVMay 24, 2021Code
LineCounter: Learning Handwritten Text Line Segmentation by CountingDeng Li, Yue Wu, Yicong Zhou
Handwritten Text Line Segmentation (HTLS) is a low-level but important task for many higher-level document processing tasks like handwritten text recognition. It is often formulated in terms of semantic segmentation or object detection in deep learning. However, both formulations have serious shortcomings. The former requires heavy post-processing of splitting/merging adjacent segments, while the latter may fail on dense or curved texts. In this paper, we propose a novel Line Counting formulation for HTLS -- that involves counting the number of text lines from the top at every pixel location. This formulation helps learn an end-to-end HTLS solution that directly predicts per-pixel line number for a given document image. Furthermore, we propose a deep neural network (DNN) model LineCounter to perform HTLS through the Line Counting formulation. Our extensive experiments on the three public datasets (ICDAR2013-HSC, HIT-MW, and VML-AHTE) demonstrate that LineCounter outperforms state-of-the-art HTLS approaches. Source code is available at https://github.com/Leedeng/Line-Counter.
CVMay 12, 2021Code
SauvolaNet: Learning Adaptive Sauvola Network for Degraded Document BinarizationDeng Li, Yue Wu, Yicong Zhou
Inspired by the classic Sauvola local image thresholding approach, we systematically study it from the deep neural network (DNN) perspective and propose a new solution called SauvolaNet for degraded document binarization (DDB). It is composed of three explainable modules, namely, Multi-Window Sauvola (MWS), Pixelwise Window Attention (PWA), and Adaptive Sauolva Threshold (AST). The MWS module honestly reflects the classic Sauvola but with trainable parameters and multi-window settings. The PWA module estimates the preferred window sizes for each pixel location. The AST module further consolidates the outputs from MWS and PWA and predicts the final adaptive threshold for each pixel location. As a result, SauvolaNet becomes end-to-end trainable and significantly reduces the number of required network parameters to 40K -- it is only 1\% of MobileNetV2. In the meantime, it achieves the State-of-The-Art (SoTA) performance for the DDB task -- SauvolaNet is at least comparable to, if not better than, SoTA binarization solutions in our extensive studies on the 13 public document binarization datasets. Our source code is available at https://github.com/Leedeng/SauvolaNet.
CVJul 19, 2020Code
Geometry Constrained Weakly Supervised Object LocalizationWeizeng Lu, Xi Jia, Weicheng Xie et al.
We propose a geometry constrained network, termed GC-Net, for weakly supervised object localization (WSOL). GC-Net consists of three modules: a detector, a generator and a classifier. The detector predicts the object location defined by a set of coefficients describing a geometric shape (i.e. ellipse or rectangle), which is geometrically constrained by the mask produced by the generator. The classifier takes the resulting masked images as input and performs two complementary classification tasks for the object and background. To make the mask more compact and more complete, we propose a novel multi-task loss function that takes into account area of the geometric shape, the categorical cross-entropy and the negative entropy. In contrast to previous approaches, GC-Net is trained end-to-end and predict object location without any post-processing (e.g. thresholding) that may require additional tuning. Extensive experiments on the CUB-200-2011 and ILSVRC2012 datasets show that GC-Net outperforms state-of-the-art methods by a large margin. Our source code is available at https://github.com/lwzeng/GC-Net.
CRApr 11, 2012Code
A Novel Latin Square Image CipherYue Wu, Yicong Zhou, Joseph P. Noonan et al.
In this paper, we introduce a symmetric-key Latin square image cipher (LSIC) for grayscale and color images. Our contributions to the image encryption community include 1) we develop new Latin square image encryption primitives including Latin Square Whitening, Latin Square S-box and Latin Square P-box ; 2) we provide a new way of integrating probabilistic encryption in image encryption by embedding random noise in the least significant image bit-plane; and 3) we construct LSIC with these Latin square image encryption primitives all on one keyed Latin square in a new loom-like substitution-permutation network. Consequently, the proposed LSIC achieve many desired properties of a secure cipher including a large key space, high key sensitivities, uniformly distributed ciphertext, excellent confusion and diffusion properties, semantically secure, and robustness against channel noise. Theoretical analysis show that the LSIC has good resistance to many attack models including brute-force attacks, ciphertext-only attacks, known-plaintext attacks and chosen-plaintext attacks. Experimental analysis under extensive simulation results using the complete USC-SIPI Miscellaneous image dataset demonstrate that LSIC outperforms or reach state of the art suggested by many peer algorithms. All these analysis and results demonstrate that the LSIC is very suitable for digital image encryption. Finally, we open source the LSIC MATLAB code under webpage https://sites.google.com/site/tuftsyuewu/source-code.
CVMar 4
Towards Generalized Multimodal Homography EstimationJinkun You, Jiaxin Cheng, Jie Zhang et al.
Supervised and unsupervised homography estimation methods depend on image pairs tailored to specific modalities to achieve high accuracy. However, their performance deteriorates substantially when applied to unseen modalities. To address this issue, we propose a training data synthesis method that generates unaligned image pairs with ground-truth offsets from a single input image. Our approach renders the image pairs with diverse textures and colors while preserving their structural information. These synthetic data empower the trained model to achieve greater robustness and improved generalization across various domains. Additionally, we design a network to fully leverage cross-scale information and decouple color information from feature representations, thus improving estimation accuracy. Extensive experiments show that our training data synthesis method improves generalization performance. The results also confirm the effectiveness of the proposed network.
CVNov 12, 2025
Dense Cross-Scale Image Alignment With Fully Spatial Correlation and Just Noticeable Difference GuidanceJinkun You, Jiaxue Li, Jie Zhang et al.
Existing unsupervised image alignment methods exhibit limited accuracy and high computational complexity. To address these challenges, we propose a dense cross-scale image alignment model. It takes into account the correlations between cross-scale features to decrease the alignment difficulty. Our model supports flexible trade-offs between accuracy and efficiency by adjusting the number of scales utilized. Additionally, we introduce a fully spatial correlation module to further improve accuracy while maintaining low computational costs. We incorporate the just noticeable difference to encourage our model to focus on image regions more sensitive to distortions, eliminating noticeable alignment errors. Extensive quantitative and qualitative experiments demonstrate that our method surpasses state-of-the-art approaches.
LGJan 21, 2025
Highly Efficient Rotation-Invariant Spectral Embedding for Scalable Incomplete Multi-View ClusteringXinxin Wang, Yongshan Zhang, Yicong Zhou
Incomplete multi-view clustering presents significant challenges due to missing views. Although many existing graph-based methods aim to recover missing instances or complete similarity matrices with promising results, they still face several limitations: (1) Recovered data may be unsuitable for spectral clustering, as these methods often ignore guidance from spectral analysis; (2) Complex optimization processes require high computational burden, hindering scalability to large-scale problems; (3) Most methods do not address the rotational mismatch problem in spectral embeddings. To address these issues, we propose a highly efficient rotation-invariant spectral embedding (RISE) method for scalable incomplete multi-view clustering. RISE learns view-specific embeddings from incomplete bipartite graphs to capture the complementary information. Meanwhile, a complete consensus representation with second-order rotation-invariant property is recovered from these incomplete embeddings in a unified model. Moreover, we design a fast alternating optimization algorithm with linear complexity and promising convergence to solve the proposed formulation. Extensive experiments on multiple datasets demonstrate the effectiveness, scalability, and efficiency of RISE compared to the state-of-the-art methods.
CVMay 5, 2025
Quaternion Infrared Visible Image FusionWeihua Yang, Yicong Zhou
Visible images provide rich details and color information only under well-lighted conditions while infrared images effectively highlight thermal targets under challenging conditions such as low visibility and adverse weather. Infrared-visible image fusion aims to integrate complementary information from infrared and visible images to generate a high-quality fused image. Existing methods exhibit critical limitations such as neglecting color structure information in visible images and performance degradation when processing low-quality color-visible inputs. To address these issues, we propose a quaternion infrared-visible image fusion (QIVIF) framework to generate high-quality fused images completely in the quaternion domain. QIVIF proposes a quaternion low-visibility feature learning model to adaptively extract salient thermal targets and fine-grained texture details from input infrared and visible images respectively under diverse degraded conditions. QIVIF then develops a quaternion adaptive unsharp masking method to adaptively improve high-frequency feature enhancement with balanced illumination. QIVIF further proposes a quaternion hierarchical Bayesian fusion model to integrate infrared saliency and enhanced visible details to obtain high-quality fused images. Extensive experiments across diverse datasets demonstrate that our QIVIF surpasses state-of-the-art methods under challenging low-visibility conditions.
CVMay 5, 2025
Quaternion Sparse Decomposition for Multi-focus Color Image FusionWeihua Yang, Yicong Zhou
Multi-focus color image fusion refers to integrating multiple partially focused color images to create a single all-in-focus color image. However, existing methods struggle with complex real-world scenarios due to limitations in handling color information and intricate textures. To address these challenges, this paper proposes a quaternion multi-focus color image fusion framework to perform high-quality color image fusion completely in the quaternion domain. This framework introduces 1) a quaternion sparse decomposition model to jointly learn fine-scale image details and structure information of color images in an iterative fashion for high-precision focus detection, 2) a quaternion base-detail fusion strategy to individually fuse base-scale and detail-scale results across multiple color images for preserving structure and detail information, and 3) a quaternion structural similarity refinement strategy to adaptively select optimal patches from initial fusion results and obtain the final fused result for preserving fine details and ensuring spatially consistent outputs. Extensive experiments demonstrate that the proposed framework outperforms state-of-the-art methods.
CVFeb 4, 2025
DCT-Mamba3D: Spectral Decorrelation and Spatial-Spectral Feature Extraction for Hyperspectral Image ClassificationWeijia Cao, Xiaofei Yang, Yicong Zhou et al.
Hyperspectral image classification presents challenges due to spectral redundancy and complex spatial-spectral dependencies. This paper proposes a novel framework, DCT-Mamba3D, for hyperspectral image classification. DCT-Mamba3D incorporates: (1) a 3D spectral-spatial decorrelation module that applies 3D discrete cosine transform basis functions to reduce both spectral and spatial redundancy, enhancing feature clarity across dimensions; (2) a 3D-Mamba module that leverages a bidirectional state-space model to capture intricate spatial-spectral dependencies; and (3) a global residual enhancement module that stabilizes feature representation, improving robustness and convergence. Extensive experiments on benchmark datasets show that our DCT-Mamba3D outperforms the state-of-the-art methods in challenging scenarios such as the same object in different spectra and different objects in the same spectra.
CVSep 7, 2021
CIM: Class-Irrelevant Mapping for Few-Shot ClassificationShuai Shao, Lei Xing, Yixin Chen et al.
Few-shot classification (FSC) is one of the most concerned hot issues in recent years. The general setting consists of two phases: (1) Pre-train a feature extraction model (FEM) with base data (has large amounts of labeled samples). (2) Use the FEM to extract the features of novel data (with few labeled samples and totally different categories from base data), then classify them with the to-be-designed classifier. The adaptability of pre-trained FEM to novel data determines the accuracy of novel features, thereby affecting the final classification performances. To this end, how to appraise the pre-trained FEM is the most crucial focus in the FSC community. It sounds like traditional Class Activate Mapping (CAM) based methods can achieve this by overlaying weighted feature maps. However, due to the particularity of FSC (e.g., there is no backpropagation when using the pre-trained FEM to extract novel features), we cannot activate the feature map with the novel classes. To address this challenge, we propose a simple, flexible method, dubbed as Class-Irrelevant Mapping (CIM). Specifically, first, we introduce dictionary learning theory and view the channels of the feature map as the bases in a dictionary. Then we utilize the feature map to fit the feature vector of an image to achieve the corresponding channel weights. Finally, we overlap the weighted feature map for visualization to appraise the ability of pre-trained FEM on novel data. For fair use of CIM in evaluating different models, we propose a new measurement index, called Feature Localization Accuracy (FLA). In experiments, we first compare our CIM with CAM in regular tasks and achieve outstanding performances. Next, we use our CIM to appraise several classical FSC frameworks without considering the classification results and discuss them.
CRJun 27, 2021
Secure Reversible Data Hiding in Encrypted Images Using Cipher-Feedback Secret SharingZhongyun Hua, Yanxiang Wang, Shuang Yi et al.
Reversible data hiding in encrypted images (RDH-EI) has attracted increasing attention, since it can protect the privacy of original images while the embedded data can be exactly extracted. Recently, some RDH-EI schemes with multiple data hiders have been proposed using secret sharing technique. However, these schemes protect the contents of the original images with lightweight security level. In this paper, we propose a high-security RDH-EI scheme with multiple data hiders. First, we introduce a cipher-feedback secret sharing (CFSS) technique. It follows the cryptography standards by introducing the cipher-feedback strategy of AES. Then, using the CFSS technique, we devise a new (r,n)-threshold (r<=n) RDH-EI scheme with multiple data hiders called CFSS-RDHEI. It can encrypt an original image into n encrypted images with reduced size using an encryption key and sends each encrypted image to one data hider. Each data hider can independently embed secret data into the encrypted image to obtain the corresponding marked encrypted image. The original image can be completely recovered from r marked encrypted images and the encryption key. Performance evaluations show that our CFSS-RDHEI scheme has high embedding rate and its generated encrypted images are much smaller, compared to existing secret sharing-based RDH-EI schemes. Security analysis demonstrates that it can achieve high security to defense some commonly used security attacks.
CVJun 24, 2021
Detection of Deepfake Videos Using Long Distance AttentionWei Lu, Lingyi Liu, Junwei Luo et al.
With the rapid progress of deepfake techniques in recent years, facial video forgery can generate highly deceptive video contents and bring severe security threats. And detection of such forgery videos is much more urgent and challenging. Most existing detection methods treat the problem as a vanilla binary classification problem. In this paper, the problem is treated as a special fine-grained classification problem since the differences between fake and real faces are very subtle. It is observed that most existing face forgery methods left some common artifacts in the spatial domain and time domain, including generative defects in the spatial domain and inter-frame inconsistencies in the time domain. And a spatial-temporal model is proposed which has two components for capturing spatial and temporal forgery traces in global perspective respectively. The two components are designed using a novel long distance attention mechanism. The one component of the spatial domain is used to capture artifacts in a single frame, and the other component of the time domain is used to capture artifacts in consecutive frames. They generate attention maps in the form of patches. The attention method has a broader vision which contributes to better assembling global information and extracting local statistic information. Finally, the attention maps are used to guide the network to focus on pivotal parts of the face, just like other fine-grained classification methods. The experimental results on different public datasets demonstrate that the proposed method achieves the state-of-the-art performance, and the proposed long distance attention method can effectively capture pivotal parts for face forgery.
CVMar 8, 2021
Bridging the Distribution Gap of Visible-Infrared Person Re-identification with Modality Batch NormalizationWenkang Li, Qi Ke, Wenbin Chen et al.
Visible-infrared cross-modality person re-identification (VI-ReID), whose aim is to match person images between visible and infrared modality, is a challenging cross-modality image retrieval task. Most existing works integrate batch normalization layers into their neural network, but we found out that batch normalization layers would lead to two types of distribution gap: 1) inter-mini-batch distribution gap -- the distribution gap of the same modality between each mini-batch; 2) intra-mini-batch modality distribution gap -- the distribution gap of different modality within the same mini-batch. To address these problems, we propose a new batch normalization layer called Modality Batch Normalization (MBN), which normalizes each modality sub-mini-batch respectively instead of the whole mini-batch, and can reduce these distribution gap significantly. Extensive experiments show that our MBN is able to boost the performance of VI-ReID models, even with different datasets, backbones and losses.
CVMar 8, 2021
Unified Batch All Triplet Loss for Visible-Infrared Person Re-identificationWenkang Li, Ke Qi, Wenbin Chen et al.
Visible-Infrared cross-modality person re-identification (VI-ReID), whose aim is to match person images between visible and infrared modality, is a challenging cross-modality image retrieval task. Batch Hard Triplet loss is widely used in person re-identification tasks, but it does not perform well in the Visible-Infrared person re-identification task. Because it only optimizes the hardest triplet for each anchor image within the mini-batch, samples in the hardest triplet may all belong to the same modality, which will lead to the imbalance problem of modality optimization. To address this problem, we adopt the batch all triplet selection strategy, which selects all the possible triplets among samples to optimize instead of the hardest triplet. Furthermore, we introduce Unified Batch All Triplet loss and Cosine Softmax loss to collaboratively optimize the cosine distance between image vectors. Similarly, we rewrite the Hetero Center Triplet loss, which is proposed for VI-ReID task, into a batch all form to improve model performance. Extensive experiments indicate the effectiveness of the proposed methods, which outperform state-of-the-art methods by a wide margin.
CVAug 21, 2020
Learning Domain-invariant Graph for Adaptive Semi-supervised Domain Adaptation with Few Labeled Source SamplesJinfeng Li, Weifeng Liu, Yicong Zhou et al.
Domain adaptation aims to generalize a model from a source domain to tackle tasks in a related but different target domain. Traditional domain adaptation algorithms assume that enough labeled data, which are treated as the prior knowledge are available in the source domain. However, these algorithms will be infeasible when only a few labeled data exist in the source domain, and thus the performance decreases significantly. To address this challenge, we propose a Domain-invariant Graph Learning (DGL) approach for domain adaptation with only a few labeled source samples. Firstly, DGL introduces the Nystrom method to construct a plastic graph that shares similar geometric property as the target domain. And then, DGL flexibly employs the Nystrom approximation error to measure the divergence between plastic graph and source graph to formalize the distribution mismatch from the geometric perspective. Through minimizing the approximation error, DGL learns a domain-invariant geometric graph to bridge source and target domains. Finally, we integrate the learned domain-invariant graph with the semi-supervised learning and further propose an adaptive semi-supervised model to handle the cross-domain problems. The results of extensive experiments on popular datasets verify the superiority of DGL, especially when only a few labeled source samples are available.
MMMar 28, 2019
Universal chosen-ciphertext attack for a family of image encryption schemesJunxin Chen, Lei Chen, Yicong Zhou
During the past decades, there is a great popularity employing nonlinear dynamics and permutation-substitution architecture for image encryption. There are three primary procedures in such encryption schemes, the key schedule module for producing encryption factors, permutation for image scrambling and substitution for pixel modification. Under the assumption of chosen-ciphertext attack, we evaluate the security of a class of image ciphers which adopts pixel-level permutation and modular addition for substitution. It is mathematically revealed that the mapping from differentials of ciphertexts to those of plaintexts are linear and has nothing to do with the key schedules, permutation techniques and encryption rounds. Moreover, a universal chosen-ciphertext attack is proposed and validated. Experimental results demonstrate that the plaintexts can be directly reconstructed without any security key or encryption elements. Related cryptographic discussions are also given.
CVJun 21, 2018
Ensemble p-Laplacian Regularization for Remote Sensing Image RecognitionXueqi Ma, Weifeng Liu, Dapeng Tao et al.
Recently, manifold regularized semi-supervised learning (MRSSL) received considerable attention because it successfully exploits the geometry of the intrinsic data probability distribution including both labeled and unlabeled samples to leverage the performance of a learning model. As a natural nonlinear generalization of graph Laplacian, p-Laplacian has been proved having the rich theoretical foundations to better preserve the local structure. However, it is difficult to determine the fitting graph p-Lapalcian i.e. the parameter which is a critical factor for the performance of graph p-Laplacian. Therefore, we develop an ensemble p-Laplacian regularization (EpLapR) to fully approximate the intrinsic manifold of the data distribution. EpLapR incorporates multiple graphs into a regularization term in order to sufficiently explore the complementation of graph p-Laplacian. Specifically, we construct a fused graph by introducing an optimization approach to assign suitable weights on different p-value graphs. And then, we conduct semi-supervised learning framework on the fused graph. Extensive experiments on UC-Merced data set demonstrate the effectiveness and efficiency of the proposed method.
CVJun 21, 2018
Hypergraph p-Laplacian Regularization for Remote Sensing Image RecognitionXueqi Ma, Weifeng Liu, Shuying Li et al.
It is of great importance to preserve locality and similarity information in semi-supervised learning (SSL) based applications. Graph based SSL and manifold regularization based SSL including Laplacian regularization (LapR) and Hypergraph Laplacian regularization (HLapR) are representative SSL methods and have achieved prominent performance by exploiting the relationship of sample distribution. However, it is still a great challenge to exactly explore and exploit the local structure of the data distribution. In this paper, we present an effect and effective approximation algorithm of Hypergraph p-Laplacian and then propose Hypergraph p-Laplacian regularization (HpLapR) to preserve the geometry of the probability distribution. In particular, p-Laplacian is a nonlinear generalization of the standard graph Laplacian and Hypergraph is a generalization of a standard graph. Therefore, the proposed HpLapR provides more potential to exploiting the local structure preserving. We apply HpLapR to logistic regression and conduct the implementations for remote sensing image recognition. We compare the proposed HpLapR to several popular manifold regularization based SSL methods including LapR, HLapR and HpLapR on UC-Merced dataset. The experimental results demonstrate the superiority of the proposed HpLapR.
CVNov 19, 2017
Vision Recognition using Discriminant Sparse Optimization LearningQingxiang Feng, Yicong Zhou
To better select the correct training sample and obtain the robust representation of the query sample, this paper proposes a discriminant-based sparse optimization learning model. This learning model integrates discriminant and sparsity together. Based on this model, we then propose a classifier called locality-based discriminant sparse representation (LDSR). Because discriminant can help to increase the difference of samples in different classes and to decrease the difference of samples within the same class, LDSR can obtain better sparse coefficients and constitute a better sparse representation for classification. In order to take advantages of kernel techniques, discriminant and sparsity, we further propose a nonlinear classifier called kernel locality-based discriminant sparse representation (KLDSR). Experiments on several well-known databases prove that the performance of LDSR and KLDSR is better than that of several state-of-the-art methods including deep learning based methods.
CVNov 19, 2017
Discriminant Projection Representation-based Classification for Vision RecognitionQingxiang Feng, Yicong Zhou
Representation-based classification methods such as sparse representation-based classification (SRC) and linear regression classification (LRC) have attracted a lot of attentions. In order to obtain the better representation, a novel method called projection representation-based classification (PRC) is proposed for image recognition in this paper. PRC is based on a new mathematical model. This model denotes that the 'ideal projection' of a sample point $x$ on the hyper-space $H$ may be gained by iteratively computing the projection of $x$ on a line of hyper-space $H$ with the proper strategy. Therefore, PRC is able to iteratively approximate the 'ideal representation' of each subject for classification. Moreover, the discriminant PRC (DPRC) is further proposed, which obtains the discriminant information by maximizing the ratio of the between-class reconstruction error over the within-class reconstruction error. Experimental results on five typical databases show that the proposed PRC and DPRC are effective and outperform other state-of-the-art methods on several vision recognition tasks.
CVNov 19, 2017
Color Face Recognition using High-Dimension Quaternion-based Adaptive RepresentationQingxiang Feng, Yicong Zhou
Recently, quaternion collaborative representation-based classification (QCRC) and quaternion sparse representation-based classification (QSRC) have been proposed for color face recognition. They can obtain correlation information among different color channels. However, their performance is unstable in different conditions. For example, QSRC performs better than than QCRC on some situations but worse on other situations. To benefit from quaternion-based $e_2$-norm minimization in QCRC and quaternion-based $e_1$-norm minimization in QSRC, we propose the quaternion-based adaptive representation (QAR) that uses a quaternion-based $e_p$-norm minimization ($1 \le p \le 2$) for color face recognition. To obtain the high dimension correlation information among different color channels, we further propose the high-dimension quaternion-based adaptive representation (HD-QAR). The experimental results demonstrate that the proposed QAR and HD-QAR achieve better recognition rates than QCRC, QSRC and several state-of-the-art methods.
CDDec 14, 2016
Nonlinear Chaotic Processing ModelZhongyun Hua, Yicong Zhou
Designing chaotic maps with complex dynamics is a challenging topic. This paper introduces the nonlinear chaotic processing (NCP) model, which contains six basic nonlinear operations. Each operation is a general framework that can use existing chaotic maps as seed maps to generate a huge number of new chaotic maps. The proposed NCP model can be easily extended by introducing new nonlinear operations or by arbitrarily combining existing ones. The properties and chaotic behaviors of the NCP model are investigated. To show its effectiveness and usability, as examples, we provide four new chaotic maps generated by the NCP model and evaluate their chaotic performance using Lyapunov exponent, Shannon entropy, correlation dimension and initial state sensitivity. The experimental results show that these chaotic maps have more complex chaotic behaviors than existing ones.