Yi Chang

CV
h-index30
17papers
665citations
Novelty48%
AI Score48

17 Papers

24.1LGJul 7, 2024
PTaRL: Prototype-based Tabular Representation Learning via Space Calibration

Hangting Ye, Wei Fan, Xiaozhuang Song et al.

Tabular data have been playing a mostly important role in diverse real-world fields, such as healthcare, engineering, finance, etc. With the recent success of deep learning, many tabular machine learning (ML) methods based on deep networks (e.g., Transformer, ResNet) have achieved competitive performance on tabular benchmarks. However, existing deep tabular ML methods suffer from the representation entanglement and localization, which largely hinders their prediction performance and leads to performance inconsistency on tabular tasks. To overcome these problems, we explore a novel direction of applying prototype learning for tabular ML and propose a prototype-based tabular representation learning framework, PTaRL, for tabular prediction tasks. The core idea of PTaRL is to construct prototype-based projection space (P-Space) and learn the disentangled representation around global data prototypes. Specifically, PTaRL mainly involves two stages: (i) Prototype Generation, that constructs global prototypes as the basis vectors of P-Space for representation, and (ii) Prototype Projection, that projects the data samples into P-Space and keeps the core global data information via Optimal Transport. Then, to further acquire the disentangled representations, we constrain PTaRL with two strategies: (i) to diversify the coordinates towards global prototypes of different representations within P-Space, we bring up a diversification constraint for representation calibration; (ii) to avoid prototype entanglement in P-Space, we introduce a matrix orthogonalization constraint to ensure the independence of global prototypes. Finally, we conduct extensive experiments in PTaRL coupled with state-of-the-art deep tabular ML models on various tabular benchmarks and the results have shown our consistent superiority.

35.3CLJan 21, 2025Code
A Survey of Graph Retrieval-Augmented Generation for Customized Large Language Models

Qinggang Zhang, Shengyuan Chen, Yuanchen Bei et al.

Large language models (LLMs) have demonstrated remarkable capabilities in a wide range of tasks, yet their application to specialized domains remains challenging due to the need for deep expertise. Retrieval-Augmented generation (RAG) has emerged as a promising solution to customize LLMs for professional fields by seamlessly integrating external knowledge bases, enabling real-time access to domain-specific expertise during inference. Despite its potential, traditional RAG systems, based on flat text retrieval, face three critical challenges: (i) complex query understanding in professional contexts, (ii) difficulties in knowledge integration across distributed sources, and (iii) system efficiency bottlenecks at scale. This survey presents a systematic analysis of Graph-based Retrieval-Augmented Generation (GraphRAG), a new paradigm that revolutionizes domain-specific LLM applications. GraphRAG addresses traditional RAG limitations through three key innovations: (i) graph-structured knowledge representation that explicitly captures entity relationships and domain hierarchies, (ii) efficient graph-based retrieval techniques that enable context-preserving knowledge retrieval with multihop reasoning ability, and (iii) structure-aware knowledge integration algorithms that leverage retrieved knowledge for accurate and logical coherent generation of LLMs. In this survey, we systematically analyze the technical foundations of GraphRAG and examine current implementations across various professional domains, identifying key technical challenges and promising research directions. All the related resources of GraphRAG, including research papers, open-source data, and projects, are collected for the community in https://github.com/DEEP-PolyU/Awesome-GraphRAG.

10.2CVNov 11, 2025
Spatio-Temporal Context Learning with Temporal Difference Convolution for Moving Infrared Small Target Detection

Houzhang Fang, Shukai Guo, Qiuhuan Chen et al.

Moving infrared small target detection (IRSTD) plays a critical role in practical applications, such as surveillance of unmanned aerial vehicles (UAVs) and UAV-based search system. Moving IRSTD still remains highly challenging due to weak target features and complex background interference. Accurate spatio-temporal feature modeling is crucial for moving target detection, typically achieved through either temporal differences or spatio-temporal (3D) convolutions. Temporal difference can explicitly leverage motion cues but exhibits limited capability in extracting spatial features, whereas 3D convolution effectively represents spatio-temporal features yet lacks explicit awareness of motion dynamics along the temporal dimension. In this paper, we propose a novel moving IRSTD network (TDCNet), which effectively extracts and enhances spatio-temporal features for accurate target detection. Specifically, we introduce a novel temporal difference convolution (TDC) re-parameterization module that comprises three parallel TDC blocks designed to capture contextual dependencies across different temporal ranges. Each TDC block fuses temporal difference and 3D convolution into a unified spatio-temporal convolution representation. This re-parameterized module can effectively capture multi-scale motion contextual features while suppressing pseudo-motion clutter in complex backgrounds, significantly improving detection performance. Moreover, we propose a TDC-guided spatio-temporal attention mechanism that performs cross-attention between the spatio-temporal features from the TDC-based backbone and a parallel 3D backbone. This mechanism models their global semantic dependencies to refine the current frame's features. Extensive experiments on IRSTD-UAV and public infrared datasets demonstrate that our TDCNet achieves state-of-the-art detection performance in moving target detection.

22.6CLFeb 20, 2025Code
StructFlowBench: A Structured Flow Benchmark for Multi-turn Instruction Following

Jinnan Li, Jinzhe Li, Yue Wang et al.

Multi-turn instruction following capability constitutes a core competency of large language models (LLMs) in real-world applications. Existing evaluation benchmarks predominantly focus on fine-grained constraint satisfaction and domain-specific capability assessment, yet overlook the crucial structural dependencies between dialogue turns that distinguish multi-turn from single-turn interactions. These structural dependencies not only reflect user intent but also establish an essential second dimension for the instruction following evaluation beyond constraint satisfaction. To address this gap, we propose StructFlowBench, a multi-turn instruction following benchmark with structural flow modeling. The benchmark defines an innovative structural flow framework with six fundamental inter-turn relationships. These relationships introduce novel structural constraints for model evaluation and also serve as generation parameters for creating customized dialogue flows tailored to specific scenarios. Adopting established LLM-based automatic evaluation methodologies, we conduct systematic evaluations of 13 leading open-source and closed-source LLMs. Experimental results reveal significant deficiencies in current models' comprehension of multi-turn dialogue structures. The code is available at https://github.com/MLGroupJLU/StructFlowBench.

7.3IVOct 14, 2023
B-Spine: Learning B-Spline Curve Representation for Robust and Interpretable Spinal Curvature Estimation

Hao Wang, Qiang Song, Ruofeng Yin et al.

Spinal curvature estimation is important to the diagnosis and treatment of the scoliosis. Existing methods face several issues such as the need of expensive annotations on the vertebral landmarks and being sensitive to the image quality. It is challenging to achieve robust estimation and obtain interpretable results, especially for low-quality images which are blurry and hazy. In this paper, we propose B-Spine, a novel deep learning pipeline to learn B-spline curve representation of the spine and estimate the Cobb angles for spinal curvature estimation from low-quality X-ray images. Given a low-quality input, a novel SegRefine network which employs the unpaired image-to-image translation is proposed to generate a high quality spine mask from the initial segmentation result. Next, a novel mask-based B-spline prediction model is proposed to predict the B-spline curve for the spine centerline. Finally, the Cobb angles are estimated by a hybrid approach which combines the curve slope analysis and a curve-based regression model. We conduct quantitative and qualitative comparisons with the representative and SOTA learning-based methods on the public AASCE2019 dataset and our new proposed CJUH-JLU dataset which contains more challenging low-quality images. The superior performance on both datasets shows our method can achieve both robustness and interpretability for spinal curvature estimation.

3.4CLOct 14, 2024Code
Large Language Model Evaluation via Matrix Nuclear-Norm

Yahan Li, Tingyu Xia, Yi Chang et al.

As large language models (LLMs) continue to evolve, efficient evaluation metrics are vital for assessing their ability to compress information and reduce redundancy. While traditional metrics like Matrix Entropy offer valuable insights, they are computationally intensive for large-scale models due to their \( O(n^3) \) time complexity with Singular Value Decomposition (SVD). To mitigate this issue, we introduce the Matrix Nuclear-Norm, which not only serves as a metric to quantify the data compression proficiency of LLM but also provides a convex approximation of matrix rank to capture both predictive discriminability and diversity. By employing the \( L_{1,2}\text{-norm} \) to further approximate the nuclear norm, we can effectively assess the model's information compression capabilities. This approach reduces the time complexity to \( O(n^2) \) and eliminates the need for SVD computation. Consequently, the Matrix Nuclear-Norm achieves speeds 8 to 24 times faster than Matrix Entropy for the CEREBRAS-GPT model as sizes increase from 111M to 6.7B. This performance gap becomes more pronounced with larger models, as validated in tests with other models like Pythia. Additionally, evaluations on benchmarks and model responses confirm that our proposed Matrix Nuclear-Norm is a reliable, scalable, and efficient tool for assessing LLMs' performance, striking a balance between accuracy and computational efficiency. The code is available at https://github.com/MLGroupJLU/MatrixNuclearNorm.

1.4LGFeb 6
Calibrating Tabular Anomaly Detection via Optimal Transport

Hangting Ye, He Zhao. Wei Fan, Xiaozhuang Song et al.

Tabular anomaly detection (TAD) remains challenging due to the heterogeneity of tabular data: features lack natural relationships, vary widely in distribution and scale, and exhibit diverse types. Consequently, each TAD method makes implicit assumptions about anomaly patterns that work well on some datasets but fail on others, and no method consistently outperforms across diverse scenarios. We present CTAD (Calibrating Tabular Anomaly Detection), a model-agnostic post-processing framework that enhances any existing TAD detector through sample-specific calibration. Our approach characterizes normal data via two complementary distributions, i.e., an empirical distribution from random sampling and a structural distribution from K-means centroids, and measures how adding a test sample disrupts their compatibility using Optimal Transport (OT) distance. Normal samples maintain low disruption while anomalies cause high disruption, providing a calibration signal to amplify detection. We prove that OT distance has a lower bound proportional to the test sample's distance from centroids, and establish that anomalies systematically receive higher calibration scores than normals in expectation, explaining why the method generalizes across datasets. Extensive experiments on 34 diverse tabular datasets with 7 representative detectors spanning all major TAD categories (density estimation, classification, reconstruction, and isolation-based methods) demonstrate that CTAD consistently improves performance with statistical significance. Remarkably, CTAD enhances even state-of-the-art deep learning methods and shows robust performance across diverse hyperparameter settings, requiring no additional tuning for practical deployment.

15.3CVMar 12, 2024
Bring Event into RGB and LiDAR: Hierarchical Visual-Motion Fusion for Scene Flow

Hanyu Zhou, Yi Chang, Zhiwei Shi et al.

Single RGB or LiDAR is the mainstream sensor for the challenging scene flow, which relies heavily on visual features to match motion features. Compared with single modality, existing methods adopt a fusion strategy to directly fuse the cross-modal complementary knowledge in motion space. However, these direct fusion methods may suffer the modality gap due to the visual intrinsic heterogeneous nature between RGB and LiDAR, thus deteriorating motion features. We discover that event has the homogeneous nature with RGB and LiDAR in both visual and motion spaces. In this work, we bring the event as a bridge between RGB and LiDAR, and propose a novel hierarchical visual-motion fusion framework for scene flow, which explores a homogeneous space to fuse the cross-modal complementary knowledge for physical interpretation. In visual fusion, we discover that event has a complementarity (relative v.s. absolute) in luminance space with RGB for high dynamic imaging, and has a complementarity (local boundary v.s. global shape) in scene structure space with LiDAR for structure integrity. In motion fusion, we figure out that RGB, event and LiDAR are complementary (spatial-dense, temporal-dense v.s. spatiotemporal-sparse) to each other in correlation space, which motivates us to fuse their motion correlations for motion continuity. The proposed hierarchical fusion can explicitly fuse the multimodal knowledge to progressively improve scene flow from visual space to motion space. Extensive experiments have been performed to verify the superiority of the proposed method.

15.9SEMar 26, 2025
Optimizing Case-Based Reasoning System for Functional Test Script Generation with Large Language Models

Siyuan Guo, Huiwu Liu, Xiaolong Chen et al.

In this work, we explore the potential of large language models (LLMs) for generating functional test scripts, which necessitates understanding the dynamically evolving code structure of the target software. To achieve this, we propose a case-based reasoning (CBR) system utilizing a 4R cycle (i.e., retrieve, reuse, revise, and retain), which maintains and leverages a case bank of test intent descriptions and corresponding test scripts to facilitate LLMs for test script generation. To improve user experience further, we introduce Re4, an optimization method for the CBR system, comprising reranking-based retrieval finetuning and reinforced reuse finetuning. Specifically, we first identify positive examples with high semantic and script similarity, providing reliable pseudo-labels for finetuning the retriever model without costly labeling. Then, we apply supervised finetuning, followed by a reinforcement learning finetuning stage, to align LLMs with our production scenarios, ensuring the faithful reuse of retrieved cases. Extensive experimental results on two product development units from Huawei Datacom demonstrate the superiority of the proposed CBR+Re4. Notably, we also show that the proposed Re4 method can help alleviate the repetitive generation issues with LLMs.

3.7CVMar 18, 2024
GT-Rain Single Image Deraining Challenge Report

Howard Zhang, Yunhao Ba, Ethan Yang et al.

This report reviews the results of the GT-Rain challenge on single image deraining at the UG2+ workshop at CVPR 2023. The aim of this competition is to study the rainy weather phenomenon in real world scenarios, provide a novel real world rainy image dataset, and to spark innovative ideas that will further the development of single image deraining methods on real images. Submissions were trained on the GT-Rain dataset and evaluated on an extension of the dataset consisting of 15 additional scenes. Scenes in GT-Rain are comprised of real rainy image and ground truth image captured moments after the rain had stopped. 275 participants were registered in the challenge and 55 competed in the final testing phase.

2.3SDAug 6, 2021
The EIHW-GLAM Deep Attentive Multi-model Fusion System for Cough-based COVID-19 Recognition in the DiCOVA 2021 Challenge

Zhao Ren, Yi Chang, Björn W. Schuller

Aiming to automatically detect COVID-19 from cough sounds, we propose a deep attentive multi-model fusion system evaluated on the Track-1 dataset of the DiCOVA 2021 challenge. Three kinds of representations are extracted, including hand-crafted features, image-from-audio-based deep representations, and audio-based deep representations. Afterwards, the best models on the three types of features are fused at both the feature level and the decision level. The experimental results demonstrate that the proposed attention-based fusion at the feature level achieves the best performance (AUC: 77.96%) on the test set, resulting in an 8.05% improvement over the official baseline.

18.1IVMay 29, 2021
Self-Supervised Nonlinear Transform-Based Tensor Nuclear Norm for Multi-Dimensional Image Recovery

Yi-Si Luo, Xi-Le Zhao, Tai-Xiang Jiang et al.

In this paper, we study multi-dimensional image recovery. Recently, transform-based tensor nuclear norm minimization methods are considered to capture low-rank tensor structures to recover third-order tensors in multi-dimensional image processing applications. The main characteristic of such methods is to perform the linear transform along the third mode of third-order tensors, and then compute tensor nuclear norm minimization on the transformed tensor so that the underlying low-rank tensors can be recovered. The main aim of this paper is to propose a nonlinear multilayer neural network to learn a nonlinear transform via the observed tensor data under self-supervision. The proposed network makes use of low-rank representation of transformed tensors and data-fitting between the observed tensor and the reconstructed tensor to construct the nonlinear transformation. Extensive experimental results on tensor completion, background subtraction, robust tensor completion, and snapshot compressive imaging are presented to demonstrate that the performance of the proposed method is better than that of state-of-the-art methods.

24.7IVMar 25, 2021
Closing the Loop: Joint Rain Generation and Removal via Disentangled Image Translation

Yuntong Ye, Yi Chang, Hanyu Zhou et al.

Existing deep learning-based image deraining methods have achieved promising performance for synthetic rainy images, typically rely on the pairs of sharp images and simulated rainy counterparts. However, these methods suffer from significant performance drop when facing the real rain, because of the huge gap between the simplified synthetic rain and the complex real rain. In this work, we argue that the rain generation and removal are the two sides of the same coin and should be tightly coupled. To close the loop, we propose to jointly learn real rain generation and removal procedure within a unified disentangled image translation framework. Specifically, we propose a bidirectional disentangled translation network, in which each unidirectional network contains two loops of joint rain generation and removal for both the real and synthetic rain image, respectively. Meanwhile, we enforce the disentanglement strategy by decomposing the rainy image into a clean background and rain layer (rain removal), in order to better preserve the identity background via both the cycle-consistency loss and adversarial loss, and ease the rain layer translating between the real and synthetic rainy image. A counterpart composition with the entanglement strategy is symmetrically applied for rain generation. Extensive experiments on synthetic and real-world rain datasets show the superiority of proposed method compared to state-of-the-arts.

1.2CVAug 22, 2020
Unsupervised Hyperspectral Mixed Noise Removal Via Spatial-Spectral Constrained Deep Image Prior

Yi-Si Luo, Xi-Le Zhao, Tai-Xiang Jiang et al.

Recently, convolutional neural network (CNN)-based methods are proposed for hyperspectral images (HSIs) denoising. Among them, unsupervised methods such as the deep image prior (DIP) have received much attention because these methods do not require any training data. However, DIP suffers from the semi-convergence behavior, i.e., the iteration of DIP needs to terminate by referring to the ground-truth image at the optimal iteration point. In this paper, we propose the spatial-spectral constrained deep image prior (S2DIP) for HSI mixed noise removal. Specifically, we incorporate DIP with a spatial-spectral total variation (SSTV) term to fully preserve the spatial-spectral local smoothness of the HSI and an $\ell_1$-norm term to capture the complex sparse noise. The proposed S2DIP jointly leverages the expressive power brought from the deep CNN without any training data and exploits the HSI and noise structures via hand-crafted priors. Thus, our method avoids the semi-convergence behavior, showing higher stabilities than DIP. Meanwhile, our method largely enhances the HSI denoising ability of DIP. To tackle the proposed denoising model, we develop an alternating direction multiplier method algorithm. Extensive experiments demonstrate that the proposed S2DIP outperforms optimization-based and supervised CNN-based state-of-the-art HSI denoising methods.

5.2CVAug 26, 2018
Rain Streak Removal for Single Image via Kernel Guided CNN

Ye-Tao Wang, Xi-Le Zhao, Tai-Xiang Jiang et al.

Rain streak removal is an important issue and has recently been investigated extensively. Existing methods, especially the newly emerged deep learning methods, could remove the rain streaks well in many cases. However the essential factor in the generative procedure of the rain streaks, i.e., the motion blur, which leads to the line pattern appearances, were neglected by the deep learning rain streaks approaches and this resulted in over-derain or under-derain results. In this paper, we propose a novel rain streak removal approach using a kernel guided convolutional neural network (KGCNN), achieving the state-of-the-art performance with simple network architectures. We first model the rain streak interference with its motion blur mechanism. Then, our framework starts with learning the motion blur kernel, which is determined by two factors including angle and length, by a plain neural network, denoted as parameter net, from a patch of the texture component. Then, after a dimensionality stretching operation, the learned motion blur kernel is stretched into a degradation map with the same spatial size as the rainy patch. The stretched degradation map together with the texture patch is subsequently input into a derain convolutional network, which is a typical ResNet architecture and trained to output the rain streaks with the guidance of the learned motion blur kernel. Experiments conducted on extensive synthetic and real data demonstrate the effectiveness of the proposed method, which preserves the texture and the contrast while removing the rain streaks.

15.0CVSep 1, 2017
Weighted Low-rank Tensor Recovery for Hyperspectral Image Restoration

Yi Chang, Luxin Yan, Houzhang Fang et al.

Hyperspectral imaging, providing abundant spatial and spectral information simultaneously, has attracted a lot of interest in recent years. Unfortunately, due to the hardware limitations, the hyperspectral image (HSI) is vulnerable to various degradations, such noises (random noise, HSI denoising), blurs (Gaussian and uniform blur, HSI deblurring), and down-sampled (both spectral and spatial downsample, HSI super-resolution). Previous HSI restoration methods are designed for one specific task only. Besides, most of them start from the 1-D vector or 2-D matrix models and cannot fully exploit the structurally spectral-spatial correlation in 3-D HSI. To overcome these limitations, in this work, we propose a unified low-rank tensor recovery model for comprehensive HSI restoration tasks, in which non-local similarity between spectral-spatial cubic and spectral correlation are simultaneously captured by 3-order tensors. Further, to improve the capability and flexibility, we formulate it as a weighted low-rank tensor recovery (WLRTR) model by treating the singular values differently, and study its analytical solution. We also consider the exclusive stripe noise in HSI as the gross error by extending WLRTR to robust principal component analysis (WLRTR-RPCA). Extensive experiments demonstrate the proposed WLRTR models consistently outperform state-of-the-arts in typical low level vision HSI tasks, including denoising, destriping, deblurring and super-resolution.

14.1CVJul 21, 2016
Hierarchical Attention Network for Action Recognition in Videos

Yilin Wang, Suhang Wang, Jiliang Tang et al.

Understanding human actions in wild videos is an important task with a broad range of applications. In this paper we propose a novel approach named Hierarchical Attention Network (HAN), which enables to incorporate static spatial information, short-term motion information and long-term video temporal structures for complex human action understanding. Compared to recent convolutional neural network based approaches, HAN has following advantages (1) HAN can efficiently capture video temporal structures in a longer range; (2) HAN is able to reveal temporal transitions between frame chunks with different time steps, i.e. it explicitly models the temporal transitions between frames as well as video segments and (3) with a multiple step spatial temporal attention mechanism, HAN automatically learns important regions in video frames and temporal segments in the video. The proposed model is trained and evaluated on the standard video action benchmarks, i.e., UCF-101 and HMDB-51, and it significantly outperforms the state-of-the arts