Jianqing Zhu

CV
h-index26
18papers
772citations
Novelty46%
AI Score44

18 Papers

CLSep 21, 2023Code
AceGPT, Localizing Large Language Models in Arabic

Huang Huang, Fei Yu, Jianqing Zhu et al.

This paper is devoted to the development of a localized Large Language Model (LLM) specifically for Arabic, a language imbued with unique cultural characteristics inadequately addressed by current mainstream models. Significant concerns emerge when addressing cultural sensitivity and local values. To address this, the paper proposes a comprehensive solution that includes further pre-training with Arabic texts, Supervised Fine-Tuning (SFT) utilizing native Arabic instructions, and GPT-4 responses in Arabic, alongside Reinforcement Learning with AI Feedback (RLAIF) employing a reward model attuned to local culture and values. The goal is to cultivate culturally cognizant and value-aligned Arabic LLMs capable of accommodating the diverse, application-specific needs of Arabic-speaking communities. Comprehensive evaluations reveal that the resulting model, dubbed `AceGPT', sets the state-of-the-art standard for open Arabic LLMs across various benchmarks. Codes, data, and models are in https://github.com/FreedomIntelligence/AceGPT.

CVApr 29, 2022Code
A Challenging Benchmark of Anime Style Recognition

Haotang Li, Shengtao Guo, Kailin Lyu et al.

Given two images of different anime roles, anime style recognition (ASR) aims to learn abstract painting style to determine whether the two images are from the same work, which is an interesting but challenging problem. Unlike biometric recognition, such as face recognition, iris recognition, and person re-identification, ASR suffers from a much larger semantic gap but receives less attention. In this paper, we propose a challenging ASR benchmark. Firstly, we collect a large-scale ASR dataset (LSASRD), which contains 20,937 images of 190 anime works and each work at least has ten different roles. In addition to the large-scale, LSASRD contains a list of challenging factors, such as complex illuminations, various poses, theatrical colors and exaggerated compositions. Secondly, we design a cross-role protocol to evaluate ASR performance, in which query and gallery images must come from different roles to validate an ASR model is to learn abstract painting style rather than learn discriminative features of roles. Finally, we apply two powerful person re-identification methods, namely, AGW and TransReID, to construct the baseline performance on LSASRD. Surprisingly, the recent transformer model (i.e., TransReID) only acquires a 42.24% mAP on LSASRD. Therefore, we believe that the ASR task of a huge semantic gap deserves deep and long-term research. We will open our dataset and code at https://github.com/nkjcqvcpi/ASR.

AIFeb 2, 2023
FV-MgNet: Fully Connected V-cycle MgNet for Interpretable Time Series Forecasting

Jianqing Zhu, Juncai He, Lian Zhang et al.

By investigating iterative methods for a constrained linear model, we propose a new class of fully connected V-cycle MgNet for long-term time series forecasting, which is one of the most difficult tasks in forecasting. MgNet is a CNN model that was proposed for image classification based on the multigrid (MG) methods for solving discretized partial differential equations (PDEs). We replace the convolutional operations with fully connected operations in the existing MgNet and then apply them to forecasting problems. Motivated by the V-cycle structure in MG, we further propose the FV-MgNet, a V-cycle version of the fully connected MgNet, to extract features hierarchically. By evaluating the performance of FV-MgNet on popular data sets and comparing it with state-of-the-art models, we show that the FV-MgNet achieves better results with less memory usage and faster inference speed. In addition, we develop ablation experiments to demonstrate that the structure of FV-MgNet is the best choice among the many variants.

CVMar 16, 2023
Towards a Smaller Student: Capacity Dynamic Distillation for Efficient Image Retrieval

Yi Xie, Huaidong Zhang, Xuemiao Xu et al.

Previous Knowledge Distillation based efficient image retrieval methods employs a lightweight network as the student model for fast inference. However, the lightweight student model lacks adequate representation capacity for effective knowledge imitation during the most critical early training period, causing final performance degeneration. To tackle this issue, we propose a Capacity Dynamic Distillation framework, which constructs a student model with editable representation capacity. Specifically, the employed student model is initially a heavy model to fruitfully learn distilled knowledge in the early training epochs, and the student model is gradually compressed during the training. To dynamically adjust the model capacity, our dynamic framework inserts a learnable convolutional layer within each residual block in the student model as the channel importance indicator. The indicator is optimized simultaneously by the image retrieval loss and the compression loss, and a retrieval-guided gradient resetting mechanism is proposed to release the gradient conflict. Extensive experiments show that our method has superior inference speed and accuracy, e.g., on the VeRi-776 dataset, given the ResNet101 as a teacher, our method saves 67.13% model parameters and 65.67% FLOPs (around 24.13% and 21.94% higher than state-of-the-arts) without sacrificing accuracy (around 2.11% mAP higher than state-of-the-arts).

LGJun 24, 2022
TreeDRNet:A Robust Deep Model for Long Term Time Series Forecasting

Tian Zhou, Jianqing Zhu, Xue Wang et al.

Various deep learning models, especially some latest Transformer-based approaches, have greatly improved the state-of-art performance for long-term time series forecasting.However, those transformer-based models suffer a severe deterioration performance with prolonged input length, which prohibits them from using extended historical info.Moreover, these methods tend to handle complex examples in long-term forecasting with increased model complexity, which often leads to a significant increase in computation and less robustness in performance(e.g., overfitting). We propose a novel neural network architecture, called TreeDRNet, for more effective long-term forecasting. Inspired by robust regression, we introduce doubly residual link structure to make prediction more robust.Built upon Kolmogorov-Arnold representation theorem, we explicitly introduce feature selection, model ensemble, and a tree structure to further utilize the extended input sequence, which improves the robustness and representation power of TreeDRNet. Unlike previous deep models for sequential forecasting work, TreeDRNet is built entirely on multilayer perceptron and thus enjoys high computational efficiency. Our extensive empirical studies show that TreeDRNet is significantly more effective than state-of-the-art methods, reducing prediction errors by 20% to 40% for multivariate time series. In particular, TreeDRNet is over 10 times more efficient than transformer-based methods. The code will be released soon.

LGFeb 2, 2023
An Enhanced V-cycle MgNet Model for Operator Learning in Numerical Partial Differential Equations

Jianqing Zhu, Juncai He, Qiumei Huang

This study used a multigrid-based convolutional neural network architecture known as MgNet in operator learning to solve numerical partial differential equations (PDEs). Given the property of smoothing iterations in multigrid methods where low-frequency errors decay slowly, we introduced a low-frequency correction structure for residuals to enhance the standard V-cycle MgNet. The enhanced MgNet model can capture the low-frequency features of solutions considerably better than the standard V-cycle MgNet. The numerical results obtained using some standard operator learning tasks are better than those obtained using many state-of-the-art methods, demonstrating the efficiency of our model.Moreover, numerically, our new model is more robust in case of low- and high-resolution data during training and testing, respectively.

LGAug 16, 2024
S$^3$Attention: Improving Long Sequence Attention with Smoothed Skeleton Sketching

Xue Wang, Tian Zhou, Jianqing Zhu et al.

Attention based models have achieved many remarkable breakthroughs in numerous applications. However, the quadratic complexity of Attention makes the vanilla Attention based models hard to apply to long sequence tasks. Various improved Attention structures are proposed to reduce the computation cost by inducing low rankness and approximating the whole sequence by sub-sequences. The most challenging part of those approaches is maintaining the proper balance between information preservation and computation reduction: the longer sub-sequences used, the better information is preserved, but at the price of introducing more noise and computational costs. In this paper, we propose a smoothed skeleton sketching based Attention structure, coined S$^3$Attention, which significantly improves upon the previous attempts to negotiate this trade-off. S$^3$Attention has two mechanisms to effectively minimize the impact of noise while keeping the linear complexity to the sequence length: a smoothing block to mix information over long sequences and a matrix sketching method that simultaneously selects columns and rows from the input matrix. We verify the effectiveness of S$^3$Attention both theoretically and empirically. Extensive studies over Long Range Arena (LRA) datasets and six time-series forecasting show that S$^3$Attention significantly outperforms both vanilla Attention and other state-of-the-art variants of Attention structures.

CLDec 4, 2024Code
Alignment at Pre-training! Towards Native Alignment for Arabic LLMs

Juhao Liang, Zhenyang Cai, Jianqing Zhu et al.

The alignment of large language models (LLMs) is critical for developing effective and safe language models. Traditional approaches focus on aligning models during the instruction tuning or reinforcement learning stages, referred to in this paper as `post alignment'. We argue that alignment during the pre-training phase, which we term `native alignment', warrants investigation. Native alignment aims to prevent unaligned content from the beginning, rather than relying on post-hoc processing. This approach leverages extensively aligned pre-training data to enhance the effectiveness and usability of pre-trained models. Our study specifically explores the application of native alignment in the context of Arabic LLMs. We conduct comprehensive experiments and ablation studies to evaluate the impact of native alignment on model performance and alignment stability. Additionally, we release open-source Arabic LLMs that demonstrate state-of-the-art performance on various benchmarks, providing significant benefits to the Arabic LLM community.

CLDec 16, 2024Code
Second Language (Arabic) Acquisition of LLMs via Progressive Vocabulary Expansion

Jianqing Zhu, Huang Huang, Zhihang Lin et al.

This paper addresses the critical need for democratizing large language models (LLM) in the Arab world, a region that has seen slower progress in developing models comparable to state-of-the-art offerings like GPT-4 or ChatGPT 3.5, due to a predominant focus on mainstream languages (e.g., English and Chinese). One practical objective for an Arabic LLM is to utilize an Arabic-specific vocabulary for the tokenizer that could speed up decoding. However, using a different vocabulary often leads to a degradation of learned knowledge since many words are initially out-of-vocabulary (OOV) when training starts. Inspired by the vocabulary learning during Second Language (Arabic) Acquisition for humans, the released AraLLaMA employs progressive vocabulary expansion, which is implemented by a modified BPE algorithm that progressively extends the Arabic subwords in its dynamic vocabulary during training, thereby balancing the OOV ratio at every stage. The ablation study demonstrated the effectiveness of Progressive Vocabulary Expansion. Moreover, AraLLaMA achieves decent performance comparable to the best Arabic LLMs across a variety of Arabic benchmarks. Models, training data, benchmarks, and codes will be all open-sourced.

CLNov 16, 2025
TAdaRAG: Task Adaptive Retrieval-Augmented Generation via On-the-Fly Knowledge Graph Construction

Jie Zhang, Bo Tang, Wanzi Shao et al.

Retrieval-Augmented Generation (RAG) improves large language models by retrieving external knowledge, often truncated into smaller chunks due to the input context window, which leads to information loss, resulting in response hallucinations and broken reasoning chains. Moreover, traditional RAG retrieves unstructured knowledge, introducing irrelevant details that hinder accurate reasoning. To address these issues, we propose TAdaRAG, a novel RAG framework for on-the-fly task-adaptive knowledge graph construction from external sources. Specifically, we design an intent-driven routing mechanism to a domain-specific extraction template, followed by supervised fine-tuning and a reinforcement learning-based implicit extraction mechanism, ensuring concise, coherent, and non-redundant knowledge integration. Evaluations on six public benchmarks and a real-world business benchmark (NowNewsQA) across three backbone models demonstrate that TAdaRAG outperforms existing methods across diverse domains and long-text tasks, highlighting its strong generalization and practical effectiveness.

CVSep 5, 2025
Exploring Non-Local Spatial-Angular Correlations with a Hybrid Mamba-Transformer Framework for Light Field Super-Resolution

Haosong Liu, Xiancheng Zhu, Huanqiang Zeng et al.

Recently, Mamba-based methods, with its advantage in long-range information modeling and linear complexity, have shown great potential in optimizing both computational cost and performance of light field image super-resolution (LFSR). However, current multi-directional scanning strategies lead to inefficient and redundant feature extraction when applied to complex LF data. To overcome this challenge, we propose a Subspace Simple Scanning (Sub-SS) strategy, based on which we design the Subspace Simple Mamba Block (SSMB) to achieve more efficient and precise feature extraction. Furthermore, we propose a dual-stage modeling strategy to address the limitation of state space in preserving spatial-angular and disparity information, thereby enabling a more comprehensive exploration of non-local spatial-angular correlations. Specifically, in stage I, we introduce the Spatial-Angular Residual Subspace Mamba Block (SA-RSMB) for shallow spatial-angular feature extraction; in stage II, we use a dual-branch parallel structure combining the Epipolar Plane Mamba Block (EPMB) and Epipolar Plane Transformer Block (EPTB) for deep epipolar feature refinement. Building upon meticulously designed modules and strategies, we introduce a hybrid Mamba-Transformer framework, termed LFMT. LFMT integrates the strengths of Mamba and Transformer models for LFSR, enabling comprehensive information exploration across spatial, angular, and epipolar-plane domains. Experimental results demonstrate that LFMT significantly outperforms current state-of-the-art methods in LFSR, achieving substantial improvements in performance while maintaining low computational complexity on both real-word and synthetic LF datasets.

CVDec 14, 2021
An Interpretive Constrained Linear Model for ResNet and MgNet

Juncai He, Jinchao Xu, Lian Zhang et al.

We propose a constrained linear data-feature-mapping model as an interpretable mathematical model for image classification using a convolutional neural network (CNN). From this viewpoint, we establish detailed connections between the traditional iterative schemes for linear systems and the architectures of the basic blocks of ResNet- and MgNet-type models. Using these connections, we present some modified ResNet models that compared with the original models have fewer parameters and yet can produce more accurate results, thereby demonstrating the validity of this constrained learning data-feature-mapping assumption. Based on this assumption, we further propose a general data-feature iterative scheme to show the rationality of MgNet. We also provide a systematic numerical study on MgNet to show its success and advantages in image classification problems and demonstrate its advantages in comparison with established networks.

CVJul 12, 2021
GiT: Graph Interactive Transformer for Vehicle Re-identification

Fei Shen, Yi Xie, Jianqing Zhu et al.

Transformers are more and more popular in computer vision, which treat an image as a sequence of patches and learn robust global features from the sequence. However, pure transformers are not entirely suitable for vehicle re-identification because vehicle re-identification requires both robust global features and discriminative local features. For that, a graph interactive transformer (GiT) is proposed in this paper. In the macro view, a list of GiT blocks are stacked to build a vehicle re-identification model, in where graphs are to extract discriminative local features within patches and transformers are to extract robust global features among patches. In the micro view, graphs and transformers are in an interactive status, bringing effective cooperation between local and global features. Specifically, one current graph is embedded after the former level's graph and transformer, while the current transform is embedded after the current graph and the former level's transformer. In addition to the interaction between graphs and transforms, the graph is a newly-designed local correction graph, which learns discriminative local features within a patch by exploring nodes' relationships. Extensive experiments on three large-scale vehicle re-identification datasets demonstrate that our GiT method is superior to state-of-the-art vehicle re-identification approaches.

CVMay 29, 2020
Exploring Spatial Significance via Hybrid Pyramidal Graph Network for Vehicle Re-identification

Fei Shen, Jianqing Zhu, Xiaobin Zhu et al.

Existing vehicle re-identification methods commonly use spatial pooling operations to aggregate feature maps extracted via off-the-shelf backbone networks. They ignore exploring the spatial significance of feature maps, eventually degrading the vehicle re-identification performance. In this paper, firstly, an innovative spatial graph network (SGN) is proposed to elaborately explore the spatial significance of feature maps. The SGN stacks multiple spatial graphs (SGs). Each SG assigns feature map's elements as nodes and utilizes spatial neighborhood relationships to determine edges among nodes. During the SGN's propagation, each node and its spatial neighbors on an SG are aggregated to the next SG. On the next SG, each aggregated node is re-weighted with a learnable parameter to find the significance at the corresponding location. Secondly, a novel pyramidal graph network (PGN) is designed to comprehensively explore the spatial significance of feature maps at multiple scales. The PGN organizes multiple SGNs in a pyramidal manner and makes each SGN handles feature maps of a specific scale. Finally, a hybrid pyramidal graph network (HPGN) is developed by embedding the PGN behind a ResNet-50 based backbone network. Extensive experiments on three large scale vehicle databases (i.e., VeRi776, VehicleID, and VeRi-Wild) demonstrate that the proposed HPGN is superior to state-of-the-art vehicle re-identification approaches.

CVNov 13, 2018
Vehicle Re-identification Using Quadruple Directional Deep Learning Features

Jianqing Zhu, Huanqiang Zeng, Jingchang Huang et al.

In order to resist the adverse effect of viewpoint variations for improving vehicle re-identification performance, we design quadruple directional deep learning networks to extract quadruple directional deep learning features (QD-DLF) of vehicle images. The quadruple directional deep learning networks are with similar overall architecture, including the same basic deep learning architecture but different directional feature pooling layers. Specifically, the same basic deep learning architecture is a shortly and densely connected convolutional neural network to extract basic feature maps of an input square vehicle image in the first stage. Then, the quadruple directional deep learning networks utilize different directional pooling layers, i.e., horizontal average pooling (HAP) layer, vertical average pooling (VAP) layer, diagonal average pooling (DAP) layer and anti-diagonal average pooling (AAP) layer, to compress the basic feature maps into horizontal, vertical, diagonal and anti-diagonal directional feature maps, respectively. Finally, these directional feature maps are spatially normalized and concatenated together as a quadruple directional deep learning feature for vehicle re-identification. Extensive experiments on both VeRi and VehicleID databases show that the proposed QD-DLF approach outperforms multiple state-of-the-art vehicle re-identification methods.

CVDec 3, 2017
Automatic Recognition of Coal and Gangue based on Convolution Neural Network

Huichao Hong, Lixin Zheng, Jianqing Zhu et al.

We designed a gangue sorting system,and built a convolutional neural network model based on AlexNet. Data enhancement and transfer learning are used to solve the problem which the convolution neural network has insufficient training data in the training stage. An object detection and region clipping algorithm is proposed to adjust the training image data to the optimum size. Compared with traditional neural network and SVM algorithm, this algorithm has higher recognition rate for coal and coal gangue, and provides important reference for identification and separation of coal and gangue.

CVFeb 16, 2017
Deep Hybrid Similarity Learning for Person Re-identification

Jianqing Zhu, Huanqiang Zeng, Shengcai Liao et al.

Person Re-IDentification (Re-ID) aims to match person images captured from two non-overlapping cameras. In this paper, a deep hybrid similarity learning (DHSL) method for person Re-ID based on a convolution neural network (CNN) is proposed. In our approach, a CNN learning feature pair for the input image pair is simultaneously extracted. Then, both the element-wise absolute difference and multiplication of the CNN learning feature pair are calculated. Finally, a hybrid similarity function is designed to measure the similarity between the feature pair, which is realized by learning a group of weight coefficients to project the element-wise absolute difference and multiplication into a similarity score. Consequently, the proposed DHSL method is able to reasonably assign parameters of feature learning and metric learning in a CNN so that the performance of person Re-ID is improved. Experiments on three challenging person Re-ID databases, QMUL GRID, VIPeR and CUHK03, illustrate that the proposed DHSL method is superior to multiple state-of-the-art person Re-ID methods.

CVAug 5, 2014
Open-set Person Re-identification

Shengcai Liao, Zhipeng Mo, Jianqing Zhu et al.

Person re-identification is becoming a hot research for developing both machine learning algorithms and video surveillance applications. The task of person re-identification is to determine which person in a gallery has the same identity to a probe image. This task basically assumes that the subject of the probe image belongs to the gallery, that is, the gallery contains this person. However, in practical applications such as searching a suspect in a video, this assumption is usually not true. In this paper, we consider the open-set person re-identification problem, which includes two sub-tasks, detection and identification. The detection sub-task is to determine the presence of the probe subject in the gallery, and the identification sub-task is to determine which person in the gallery has the same identity as the accepted probe. We present a database collected from a video surveillance setting of 6 cameras, with 200 persons and 7,413 images segmented. Based on this database, we develop a benchmark protocol for evaluating the performance under the open-set person re-identification scenario. Several popular metric learning algorithms for person re-identification have been evaluated as baselines. From the baseline performance, we observe that the open-set person re-identification problem is still largely unresolved, thus further attention and effort is needed.