Shengwu Xiong

h-index39

19papers

247citations

Novelty35%

AI Score41

Ranked #69,363 of 194,257 authors (top 36%)#23,598 in CV (top 40%)

19 Papers

8.4CVApr 26, 2023Code

ESPT: A Self-Supervised Episodic Spatial Pretext Task for Improving Few-Shot Learning

Yi Rong, Xiongbo Lu, Zhaoyang Sun et al.

Self-supervised learning (SSL) techniques have recently been integrated into the few-shot learning (FSL) framework and have shown promising results in improving the few-shot image classification performance. However, existing SSL approaches used in FSL typically seek the supervision signals from the global embedding of every single image. Therefore, during the episodic training of FSL, these methods cannot capture and fully utilize the local visual information in image samples and the data structure information of the whole episode, which are beneficial to FSL. To this end, we propose to augment the few-shot learning objective with a novel self-supervised Episodic Spatial Pretext Task (ESPT). Specifically, for each few-shot episode, we generate its corresponding transformed episode by applying a random geometric transformation to all the images in it. Based on these, our ESPT objective is defined as maximizing the local spatial relationship consistency between the original episode and the transformed one. With this definition, the ESPT-augmented FSL objective promotes learning more transferable feature representations that capture the local spatial features of different images and their inter-relational structural information in each input episode, thus enabling the model to generalize better to new categories with only a few samples. Extensive experiments indicate that our ESPT method achieves new state-of-the-art performance for few-shot image classification on three mainstay benchmark datasets. The source code will be available at: https://github.com/Whut-YiRong/ESPT.

15.8CVSep 8, 2024Code

Visual Grounding with Multi-modal Conditional Adaptation

Ruilin Yao, Shengwu Xiong, Yichen Zhao et al.

Visual grounding is the task of locating objects specified by natural language expressions. Existing methods extend generic object detection frameworks to tackle this task. They typically extract visual and textual features separately using independent visual and textual encoders, then fuse these features in a multi-modal decoder for final prediction. However, visual grounding presents unique challenges. It often involves locating objects with different text descriptions within the same image. Existing methods struggle with this task because the independent visual encoder produces identical visual features for the same image, limiting detection performance. Some recently approaches propose various language-guided visual encoders to address this issue, but they mostly rely solely on textual information and require sophisticated designs. In this paper, we introduce Multi-modal Conditional Adaptation (MMCA), which enables the visual encoder to adaptively update weights, directing its focus towards text-relevant regions. Specifically, we first integrate information from different modalities to obtain multi-modal embeddings. Then we utilize a set of weighting coefficients, which generated from the multimodal embeddings, to reorganize the weight update matrices and apply them to the visual encoder of the visual grounding model. Extensive experiments on four widely used datasets demonstrate that MMCA achieves significant improvements and state-of-the-art results. Ablation experiments further demonstrate the lightweight and efficiency of our method. Our source code is available at: https://github.com/Mr-Bigworth/MMCA.

24.7CLMar 20, 2024Code

EthioLLM: Multilingual Large Language Models for Ethiopian Languages with Task Evaluation

Atnafu Lambebo Tonja, Israel Abebe Azime, Tadesse Destaw Belay et al.

Large language models (LLMs) have gained popularity recently due to their outstanding performance in various downstream Natural Language Processing (NLP) tasks. However, low-resource languages are still lagging behind current state-of-the-art (SOTA) developments in the field of NLP due to insufficient resources to train LLMs. Ethiopian languages exhibit remarkable linguistic diversity, encompassing a wide array of scripts, and are imbued with profound religious and cultural significance. This paper introduces EthioLLM -- multilingual large language models for five Ethiopian languages (Amharic, Ge'ez, Afan Oromo, Somali, and Tigrinya) and English, and Ethiobenchmark -- a new benchmark dataset for various downstream NLP tasks. We evaluate the performance of these models across five downstream NLP tasks. We open-source our multilingual language models, new benchmark datasets for various downstream tasks, and task-specific fine-tuned language models and discuss the performance of the models. Our dataset and models are available at the https://huggingface.co/EthioNLP repository.

10.5CVDec 15, 2024Code

SHMT: Self-supervised Hierarchical Makeup Transfer via Latent Diffusion Models

Zhaoyang Sun, Shengwu Xiong, Yaxiong Chen et al.

This paper studies the challenging task of makeup transfer, which aims to apply diverse makeup styles precisely and naturally to a given facial image. Due to the absence of paired data, current methods typically synthesize sub-optimal pseudo ground truths to guide the model training, resulting in low makeup fidelity. Additionally, different makeup styles generally have varying effects on the person face, but existing methods struggle to deal with this diversity. To address these issues, we propose a novel Self-supervised Hierarchical Makeup Transfer (SHMT) method via latent diffusion models. Following a "decoupling-and-reconstruction" paradigm, SHMT works in a self-supervised manner, freeing itself from the misguidance of imprecise pseudo-paired data. Furthermore, to accommodate a variety of makeup styles, hierarchical texture details are decomposed via a Laplacian pyramid and selectively introduced to the content representation. Finally, we design a novel Iterative Dual Alignment (IDA) module that dynamically adjusts the injection condition of the diffusion model, allowing the alignment errors caused by the domain gap between content and makeup representations to be corrected. Extensive quantitative and qualitative analyses demonstrate the effectiveness of our method. Our code is available at \url{https://github.com/Snowfallingplum/SHMT}.

6.2CVJul 25, 2025Code

Balancing Conservatism and Aggressiveness: Prototype-Affinity Hybrid Network for Few-Shot Segmentation

Tianyu Zou, Shengwu Xiong, Ruilin Yao et al.

This paper studies the few-shot segmentation (FSS) task, which aims to segment objects belonging to unseen categories in a query image by learning a model on a small number of well-annotated support samples. Our analysis of two mainstream FSS paradigms reveals that the predictions made by prototype learning methods are usually conservative, while those of affinity learning methods tend to be more aggressive. This observation motivates us to balance the conservative and aggressive information captured by these two types of FSS frameworks so as to improve the segmentation performance. To achieve this, we propose a **P**rototype-**A**ffinity **H**ybrid **Net**work (PAHNet), which introduces a Prototype-guided Feature Enhancement (PFE) module and an Attention Score Calibration (ASC) module in each attention block of an affinity learning model (called affinity learner). These two modules utilize the predictions generated by a pre-trained prototype learning model (called prototype predictor) to enhance the foreground information in support and query image representations and suppress the mismatched foreground-background (FG-BG) relationships between them, respectively. In this way, the aggressiveness of the affinity learner can be effectively mitigated, thereby eventually increasing the segmentation accuracy of our PAHNet method. Experimental results show that PAHNet outperforms most recently proposed methods across 1-shot and 5-shot settings on both PASCAL-5$^i$ and COCO-20$^i$ datasets, suggesting its effectiveness. The code is available at: [GitHub - tianyu-zou/PAHNet: Balancing Conservatism and Aggressiveness: Prototype-Affinity Hybrid Network for Few-Shot Segmentation (ICCV'25)](https://github.com/tianyu-zou/PAHNet)

9.0CVDec 2, 2019Code

Patchy Image Structure Classification Using Multi-Orientation Region Transform

Xiaohan Yu, Yang Zhao, Yongsheng Gao et al.

Exterior contour and interior structure are both vital features for classifying objects. However, most of the existing methods consider exterior contour feature and internal structure feature separately, and thus fail to function when classifying patchy image structures that have similar contours and flexible structures. To address above limitations, this paper proposes a novel Multi-Orientation Region Transform (MORT), which can effectively characterize both contour and structure features simultaneously, for patchy image structure classification. MORT is performed over multiple orientation regions at multiple scales to effectively integrate patchy features, and thus enables a better description of the shape in a coarse-to-fine manner. Moreover, the proposed MORT can be extended to combine with the deep convolutional neural network techniques, for further enhancement of classification accuracy. Very encouraging experimental results on the challenging ultra-fine-grained cultivar recognition task, insect wing recognition task, and large variation butterfly recognition task are obtained, which demonstrate the effectiveness and superiority of the proposed MORT over the state-of-the-art methods in classifying patchy image structures. Our code and three patchy image structure datasets are available at: https://github.com/XiaohanYu-GU/MReT2019.

3.7CVDec 22, 2024

RealisID: Scale-Robust and Fine-Controllable Identity Customization via Local and Global Complementation

Zhaoyang Sun, Fei Du, Weihua Chen et al.

Recently, the success of text-to-image synthesis has greatly advanced the development of identity customization techniques, whose main goal is to produce realistic identity-specific photographs based on text prompts and reference face images. However, it is difficult for existing identity customization methods to simultaneously meet the various requirements of different real-world applications, including the identity fidelity of small face, the control of face location, pose and expression, as well as the customization of multiple persons. To this end, we propose a scale-robust and fine-controllable method, namely RealisID, which learns different control capabilities through the cooperation between a pair of local and global branches. Specifically, by using cropping and up-sampling operations to filter out face-irrelevant information, the local branch concentrates the fine control of facial details and the scale-robust identity fidelity within the face region. Meanwhile, the global branch manages the overall harmony of the entire image. It also controls the face location by taking the location guidance as input. As a result, RealisID can benefit from the complementarity of these two branches. Finally, by implementing our branches with two different variants of ControlNet, our method can be easily extended to handle multi-person customization, even only trained on single-person datasets. Extensive experiments and ablation studies indicate the effectiveness of RealisID and verify its ability in fulfilling all the requirements mentioned above.

7.1LGSep 28, 2025

How Effective Are Time-Series Models for Precipitation Nowcasting? A Comprehensive Benchmark for GNSS-based Precipitation Nowcasting

Yifang Zhang, Shengwu Xiong, Henan Wang et al.

Precipitation Nowcasting, which aims to predict precipitation within the next 0 to 6 hours, is critical for disaster mitigation and real-time response planning. However, most time series forecasting benchmarks in meteorology are evaluated on variables with strong periodicity, such as temperature and humidity, which fail to reflect model capabilities in more complex and practically meteorology scenarios like precipitation nowcasting. To address this gap, we propose RainfallBench, a benchmark designed for precipitation nowcasting, a highly challenging and practically relevant task characterized by zero inflation, temporal decay, and non-stationarity, focusing on predicting precipitation within the next 0 to 6 hours. The dataset is derived from five years of meteorological observations, recorded at hourly intervals across six essential variables, and collected from more than 140 Global Navigation Satellite System (GNSS) stations globally. In particular, it incorporates precipitable water vapor (PWV), a crucial indicator of rainfall that is absent in other datasets. We further design specialized evaluation protocols to assess model performance on key meteorological challenges, including multi-scale prediction, multi-resolution forecasting, and extreme rainfall events, benchmarking 17 state-of-the-art models across six major architectures on RainfallBench. Additionally, to address the zero-inflation and temporal decay issues overlooked by existing models, we introduce Bi-Focus Precipitation Forecaster (BFPF), a plug-and-play module that incorporates domain-specific priors to enhance rainfall time series forecasting. Statistical analysis and ablation studies validate the comprehensiveness of our dataset as well as the superiority of our methodology.

1.2CVAug 19, 2020

Scene Text Detection with Selected Anchor

Anna Zhu, Hang Du, Shengwu Xiong

Object proposal technique with dense anchoring scheme for scene text detection were applied frequently to achieve high recall. It results in the significant improvement in accuracy but waste of computational searching, regression and classification. In this paper, we propose an anchor selection-based region proposal network (AS-RPN) using effective selected anchors instead of dense anchors to extract text proposals. The center, scales, aspect ratios and orientations of anchors are learnable instead of fixing, which leads to high recall and greatly reduced numbers of anchors. By replacing the anchor-based RPN in Faster RCNN, the AS-RPN-based Faster RCNN can achieve comparable performance with previous state-of-the-art text detecting approaches on standard benchmarks, including COCO-Text, ICDAR2013, ICDAR2015 and MSRA-TD500 when using single-scale and single model (ResNet50) testing only.

3.3CVMar 27, 2020

Local Facial Makeup Transfer via Disentangled Representation

Zhaoyang Sun, Wenxuan Liu, Feng Liu et al.

Facial makeup transfer aims to render a non-makeup face image in an arbitrary given makeup one while preserving face identity. The most advanced method separates makeup style information from face images to realize makeup transfer. However, makeup style includes several semantic clear local styles which are still entangled together. In this paper, we propose a novel unified adversarial disentangling network to further decompose face images into four independent components, i.e., personal identity, lips makeup style, eyes makeup style and face makeup style. Owing to the further disentangling of makeup style, our method can not only control the degree of global makeup style, but also flexibly regulate the degree of local makeup styles which any other approaches can't do. For makeup removal, different from other methods which regard makeup removal as the reverse process of makeup, we integrate the makeup transfer with the makeup removal into one uniform framework and obtain multiple makeup removal results. Extensive experiments have demonstrated that our approach can produce more realistic and accurate makeup transfer results compared to the state-of-the-art methods.

0.9CVOct 11, 2019

From Species to Cultivar: Soybean Cultivar Recognition using Multiscale Sliding Chord Matching of Leaf Images

Bin Wang, Yongsheng Gao, Xiaohan Yu et al.

Leaf image recognition techniques have been actively researched for plant species identification. However it remains unclear whether leaf patterns can provide sufficient information for cultivar recognition. This paper reports the first attempt on soybean cultivar recognition from plant leaves which is not only a challenging research problem but also important for soybean cultivar evaluation, selection and production in agriculture. In this paper, we propose a novel multiscale sliding chord matching (MSCM) approach to extract leaf patterns that are distinctive for soybean cultivar identification. A chord is defined to slide along the contour for measuring the synchronised patterns of exterior shape and interior appearance of soybean leaf images. A multiscale sliding chord strategy is developed to extract features in a coarse-to-fine hierarchical order. A joint description that integrates the leaf descriptors from different parts of a soybean plant is proposed for further enhancing the discriminative power of cultivar description. We built a cultivar leaf image database, SoyCultivar, consisting of 1200 sample leaf images from 200 soybean cultivars for performance evaluation. Encouraging experimental results of the proposed method in comparison to the state-of-the-art leaf species recognition methods demonstrate the availability of cultivar information in soybean leaves and effectiveness of the proposed MSCM for soybean cultivar identification, which may advance the research in leaf recognition from species to cultivar.

6.0CVAug 11, 2019

MobileFAN: Transferring Deep Hidden Representation for Face Alignment

Yang Zhao, Yifan Liu, Chunhua Shen et al.

Facial landmark detection is a crucial prerequisite for many face analysis applications. Deep learning-based methods currently dominate the approach of addressing the facial landmark detection. However, such works generally introduce a large number of parameters, resulting in high memory cost. In this paper, we aim for a lightweight as well as effective solution to facial landmark detection. To this end, we propose an effective lightweight model, namely Mobile Face Alignment Network (MobileFAN), using a simple backbone MobileNetV2 as the encoder and three deconvolutional layers as the decoder. The proposed MobileFAN, with only 8% of the model size and lower computational cost, achieves superior or equivalent performance compared with state-of-the-art models. Moreover, by transferring the geometric structural information of a face graph from a large complex model to our proposed MobileFAN through feature-aligned distillation and feature-similarity distillation, the performance of MobileFAN is further improved in effectiveness and efficiency for face alignment. Extensive experiment results on three challenging facial landmark estimation benchmarks including COFW, 300W and WFLW show the superiority of our proposed MobileFAN against state-of-the-art methods.

1.2NTApr 2, 2019

New Kloosterman sum identities from the Helleseth-Zinoviev result on $ Z_{4}$-linear Goethals codes

Minglong Qi, Shengwu Xiong

In the paper of Tor Helleseth and Victor Zinoviev (Designs, Codes and Cryptography, \textbf{17}, 269-288(1999)), the number of solutions of the system of equations from $ Z_{4} $-linear Goethals codes $ G_{4} $ was determined and stated in Theorem 4. We found that Theorem 4 is wrong for $ m $ even. In this note, we complete Theorem 4, and present a series of new Kloosterman sum identities deduced from Theorem 4. Moreover, we show that several previously established formulas on the Kloosterman sum identities can be rediscovered from Theorem 4 with much simpler proofs.

1.2ITJan 17, 2019

Two classes of linear codes with a few weights based on twisted Kloosterman sums

Minglong Qi, Shengwu Xiong

Linear codes with a few weights have wide applications in information security, data storage systems, consuming electronics and communication systems. Construction of the linear codes with a few weights and determination of their parameters are an important research topic in coding theory. In this paper, we construct two classes of linear codes with a few weights and determine their complete weight enumerators based on twisted Kloosterman sums.

1.2ITNov 1, 2017

On the complete weight enumerators of some linear codes with a few weights

Minglong Qi, Shengwu Xiong, Jingling Yuan et al.

Linear codes with a few weights have important applications in authentication codes, secret sharing, consumer electronics, etc.. The determination of the parameters such as Hamming weight distributions and complete weight enumerators of linear codes are important research topics. In this paper, we consider some classes of linear codes with a few weights and determine the complete weight enumerators from which the corresponding Hamming weight distributions are derived with help of some sums involving Legendre symbol.

2.5CRJul 31, 2017

On Some Exponential Sums Related to the Coulter's Polynomial

Minglong Qi, Shengwu Xiong, Jingling Yuan et al.

In this paper, the formulas of some exponential sums over finite field, related to the Coulter's polynomial, are settled based on the Coulter's theorems on Weil sums, which may have potential application in the construction of linear codes with few weights.

3.1CRFeb 19, 2016

On the Nonexistence of the Ding-Helleseth-Martinsens Constructions of Almost Difference Set for Cyclotomic Classes of Order 6

Minglong Qi, Shengwu Xiong, Jinbgling Yuan et al.

Pseudorandom sequences with optimal three-level autocorrelation have important applications in CDMA communication systems. Constructing the sequences with three-level autocorrelation is equivalent to finding cyclic almost difference sets as their supports. In a paper of Ding, Helleseth, and Martinsen, the authors developed a new method known as the Ding-Helleseth-Martinsens Constructions in literature to construct the almost difference set using product set between GF(2) and union sets of cyclotomic classes of order 4. In this correspondence, we show that there do not exist such constructions for cyclotomic classes of order 6.

3.1CRFeb 18, 2016

On a Class of Almost Difference Sets Constructed by Using the Ding-Helleseth-Martinsens Constructions

Minglong Qi, Shengwu Xiong, Jingling Yuan et al.

Pseudorandom binary sequences with optimal balance and autocorrelation have many applications in stream cipher, communication, coding theory, etc. It is known that binary sequences with three-level autocorrelation should have an almost difference set as their characteristic sets. How to construct new families of almost difference set is an important research topic in such fields as communication, coding theory and cryptography. In a work of Ding, Helleseth, and Martinsen in 2001, the authors developed a new method, known as the Ding-Helleseth-Martinsens Constructions in literature, of constructing an almost difference set from product sets of GF(2) and the union of two cyclotomic classes of order four. In the present paper, we have constructed two classes of almost difference set with product sets between GF(2) and union sets of the cyclotomic classes of order 12 using that method. In addition, we could find there do not exist the Ding-Helleseth-Martinsens Constructions for the cyclotomic classes of order six and eight.

3.2CRDec 12, 2015

On the Linear Complexity of Generalized Cyclotomic Quaternary Sequences with Length $2pq$

Minglong Qi, Shengwu Xiong, Jingling Yuan et al.

In this paper, the linear complexity over $\mathbf{GF}(r)$ of generalized cyclotomic quaternary sequences with period $2pq$ is determined, where $ r $ is an odd prime such that $r \ge 5$ and $r\notin \lbrace p,q\rbrace$. The minimal value of the linear complexity is equal to $\tfrac{5pq+p+q+1}{4}$ which is greater than the half of the period $2pq$. According to the Berlekamp-Massey algorithm, these sequences are viewed as enough good for the use in cryptography. We show also that if the character of the extension field $\mathbf{GF}(r^{m})$, $r$, is chosen so that $\bigl(\tfrac{r}{p}\bigr) = \bigl(\tfrac{r}{q}\bigr) = -1$, $r\nmid 3pq-1$, and $r\nmid 2pq-4$, then the linear complexity can reach the maximal value equal to the length of the sequences.