CVOct 25, 2023Code
Context Does Matter: End-to-end Panoptic Narrative Grounding with Deformable Attention Refined Matching NetworkYiming Lin, Xiao-Bo Jin, Qiufeng Wang et al.
Panoramic Narrative Grounding (PNG) is an emerging visual grounding task that aims to segment visual objects in images based on dense narrative captions. The current state-of-the-art methods first refine the representation of phrase by aggregating the most similar $k$ image pixels, and then match the refined text representations with the pixels of the image feature map to generate segmentation results. However, simply aggregating sampled image features ignores the contextual information, which can lead to phrase-to-pixel mis-match. In this paper, we propose a novel learning framework called Deformable Attention Refined Matching Network (DRMN), whose main idea is to bring deformable attention in the iterative process of feature learning to incorporate essential context information of different scales of pixels. DRMN iteratively re-encodes pixels with the deformable attention network after updating the feature representation of the top-$k$ most similar pixels. As such, DRMN can lead to accurate yet discriminative pixel representations, purify the top-$k$ most similar pixels, and consequently alleviate the phrase-to-pixel mis-match substantially.Experimental results show that our novel design significantly improves the matching results between text phrases and image pixels. Concretely, DRMN achieves new state-of-the-art performance on the PNG benchmark with an average recall improvement 3.5%. The codes are available in: https://github.com/JaMesLiMers/DRMN.
AIDec 3, 2024
F-SE-LSTM: A Time Series Anomaly Detection Method with Frequency Domain InformationYi-Xiang Lu, Xiao-Bo Jin, Jian Chen et al.
With the development of society, time series anomaly detection plays an important role in network and IoT services. However, most existing anomaly detection methods directly analyze time series in the time domain and cannot distinguish some relatively hidden anomaly sequences. We attempt to analyze the impact of frequency on time series from a frequency domain perspective, thus proposing a new time series anomaly detection method called F-SE-LSTM. This method utilizes two sliding windows and fast Fourier transform (FFT) to construct a frequency matrix. Simultaneously, Squeeze-and-Excitation Networks (SENet) and Long Short-Term Memory (LSTM) are employed to extract frequency-related features within and between periods. Through comparative experiments on multiple datasets such as Yahoo Webscope S5 and Numenta Anomaly Benchmark, the results demonstrate that the frequency matrix constructed by F-SE-LSTM exhibits better discriminative ability than ordinary time domain and frequency domain data. Furthermore, F-SE-LSTM outperforms existing state-of-the-art deep learning anomaly detection methods in terms of anomaly detection capability and execution efficiency.
CVOct 17, 2025
Hyperbolic Structured Classification for Robust Single Positive Multi-label LearningYiming Lin, Shang Wang, Junkai Zhou et al.
Single Positive Multi-Label Learning (SPMLL) addresses the challenging scenario where each training sample is annotated with only one positive label despite potentially belonging to multiple categories, making it difficult to capture complex label relationships and hierarchical structures. While existing methods implicitly model label relationships through distance-based similarity, lacking explicit geometric definitions for different relationship types. To address these limitations, we propose the first hyperbolic classification framework for SPMLL that represents each label as a hyperbolic ball rather than a point or vector, enabling rich inter-label relationship modeling through geometric ball interactions. Our ball-based approach naturally captures multiple relationship types simultaneously: inclusion for hierarchical structures, overlap for co-occurrence patterns, and separation for semantic independence. Further, we introduce two key component innovations: a temperature-adaptive hyperbolic ball classifier and a physics-inspired double-well regularization that guides balls toward meaningful configurations. To validate our approach, extensive experiments on four benchmark datasets (MS-COCO, PASCAL VOC, NUS-WIDE, CUB-200-2011) demonstrate competitive performance with superior interpretability compared to existing methods. Furthermore, statistical analysis reveals strong correlation between learned embeddings and real-world co-occurrence patterns, establishing hyperbolic geometry as a more robust paradigm for structured classification under incomplete supervision.
CVAug 29, 2025
The Demon is in Ambiguity: Revisiting Situation Recognition with Single Positive Multi-Label LearningYiming Lin, Yuchen Niu, Shang Wang et al.
Context recognition (SR) is a fundamental task in computer vision that aims to extract structured semantic summaries from images by identifying key events and their associated entities. Specifically, given an input image, the model must first classify the main visual events (verb classification), then identify the participating entities and their semantic roles (semantic role labeling), and finally localize these entities in the image (semantic role localization). Existing methods treat verb classification as a single-label problem, but we show through a comprehensive analysis that this formulation fails to address the inherent ambiguity in visual event recognition, as multiple verb categories may reasonably describe the same image. This paper makes three key contributions: First, we reveal through empirical analysis that verb classification is inherently a multi-label problem due to the ubiquitous semantic overlap between verb categories. Second, given the impracticality of fully annotating large-scale datasets with multiple labels, we propose to reformulate verb classification as a single positive multi-label learning (SPMLL) problem - a novel perspective in SR research. Third, we design a comprehensive multi-label evaluation benchmark for SR that is carefully designed to fairly evaluate model performance in a multi-label setting. To address the challenges of SPMLL, we futher develop the Graph Enhanced Verb Multilayer Perceptron (GE-VerbMLP), which combines graph neural networks to capture label correlations and adversarial training to optimize decision boundaries. Extensive experiments on real-world datasets show that our approach achieves more than 3\% MAP improvement while remaining competitive on traditional top-1 and top-5 accuracy metrics.
CVNov 19, 2018
Beyond Attributes: Adversarial Erasing Embedding Network for Zero-shot LearningXiao-Bo Jin, Kai-Zhu Huang, Jianyu Miao
In this paper, an adversarial erasing embedding network with the guidance of high-order attributes (AEEN-HOA) is proposed for going further to solve the challenging ZSL/GZSL task. AEEN-HOA consists of two branches, i.e., the upper stream is capable of erasing some initially discovered regions, then the high-order attribute supervision is incorporated to characterize the relationship between the class attributes. Meanwhile, the bottom stream is trained by taking the current background regions to train the same attribute. As far as we know, it is the first time of introducing the erasing operations into the ZSL task. In addition, we first propose a class attribute activation map for the visualization of ZSL output, which shows the relationship between class attribute feature and attention map. Experiments on four standard benchmark datasets demonstrate the superiority of AEEN-HOA framework.
LGOct 27, 2017
Stochastic Conjugate Gradient Algorithm with Variance ReductionXiao-Bo Jin, Xu-Yao Zhang, Kaizhu Huang et al.
Conjugate gradient (CG) methods are a class of important methods for solving linear equations and nonlinear optimization problems. In this paper, we propose a new stochastic CG algorithm with variance reduction and we prove its linear convergence with the Fletcher and Reeves method for strongly convex and smooth functions. We experimentally demonstrate that the CG with variance reduction algorithm converges faster than its counterparts for four learning models, which may be convex, nonconvex or nonsmooth. In addition, its area under the curve performance on six large-scale data sets is comparable to that of the LIBLINEAR solver for the L2-regularized L2-loss but with a significant improvement in computational efficiency
IRAug 3, 2016
Ranking Entity Based on Both of Word Frequency and Word Sematic FeaturesXiao-Bo Jin, Guang-Gang Geng, Kaizhu Huang et al.
Entity search is a new application meeting either precise or vague requirements from the search engines users. Baidu Cup 2016 Challenge just provided such a chance to tackle the problem of the entity search. We achieved the first place with the average MAP scores on 4 tasks including movie, tvShow, celebrity and restaurant. In this paper, we propose a series of similarity features based on both of the word frequency features and the word semantic features and describe our ranking architecture and experiment details.
IRSep 12, 2013
Combination of Multiple Bipartite Ranking for Web Content Quality EvaluationXiao-Bo Jin, Guang-Gang Geng, Dexian Zhang
Web content quality estimation is crucial to various web content processing applications. Our previous work applied Bagging + C4.5 to achive the best results on the ECML/PKDD Discovery Challenge 2010, which is the comibination of many point-wise rankinig models. In this paper, we combine multiple pair-wise bipartite ranking learner to solve the multi-partite ranking problems for the web quality estimation. In encoding stage, we present the ternary encoding and the binary coding extending each rank value to $L - 1$ (L is the number of the different ranking value). For the decoding, we discuss the combination of multiple ranking results from multiple bipartite ranking models with the predefined weighting and the adaptive weighting. The experiments on ECML/PKDD 2010 Discovery Challenge datasets show that \textit{binary coding} + \textit{predefined weighting} yields the highest performance in all four combinations and furthermore it is better than the best results reported in ECML/PKDD 2010 Discovery Challenge competition.
CEMay 14, 2013
Qualitative detection of oil adulteration with machine learning approachesXiao-Bo Jin, Qiang Lu, Feng Wang et al.
The study focused on the machine learning analysis approaches to identify the adulteration of 9 kinds of edible oil qualitatively and answered the following three questions: Is the oil sample adulterant? How does it constitute? What is the main ingredient of the adulteration oil? After extracting the high-performance liquid chromatography (HPLC) data on triglyceride from 370 oil samples, we applied the adaptive boosting with multi-class Hamming loss (AdaBoost.MH) to distinguish the oil adulteration in contrast with the support vector machine (SVM). Further, we regarded the adulterant oil and the pure oil samples as ones with multiple labels and with only one label, respectively. Then multi-label AdaBoost.MH and multi-label learning vector quantization (ML-LVQ) model were built to determine the ingredients and their relative ratio in the adulteration oil. The experimental results on six measures show that ML-LVQ achieves better performance than multi-label AdaBoost.MH.
IRApr 23, 2013
Evaluating Web Content Quality via Multi-scale FeaturesGuang-Gang Geng, Xiao-Bo Jin, Xin-Chang Zhang et al.
Web content quality measurement is crucial to various web content processing applications. This paper will explore multi-scale features which may affect the quality of a host, and develop automatic statistical methods to evaluate the Web content quality. The extracted properties include statistical content features, page and host level link features and TFIDF features. The experiments on ECML/PKDD 2010 Discovery Challenge data set show that the algorithm is effective and feasible for the quality tasks of multiple languages, and the multi-scale features have different identification ability and provide good complement to each other for most tasks.
LGMar 11, 2013
Linear NDCG and Pair-wise LossXiao-Bo Jin, Guang-Gang Geng
Linear NDCG is used for measuring the performance of the Web content quality assessment in ECML/PKDD Discovery Challenge 2010. In this paper, we will prove that the DCG error equals a new pair-wise loss.