Jiang Zhu

CV
h-index40
14papers
347citations
Novelty50%
AI Score45

14 Papers

ITOct 17, 2022
A Unitary Transform Based Generalized Approximate Message Passing

Jiang Zhu, Xiangming Meng, Xupeng Lei et al.

We consider the problem of recovering an unknown signal ${\mathbf x}\in {\mathbb R}^n$ from general nonlinear measurements obtained through a generalized linear model (GLM), i.e., ${\mathbf y}= f\left({\mathbf A}{\mathbf x}+{\mathbf w}\right)$, where $f(\cdot)$ is a componentwise nonlinear function. Based on the unitary transform approximate message passing (UAMP) and expectation propagation, a unitary transform based generalized approximate message passing (GUAMP) algorithm is proposed for general measurement matrices $\bf{A}$, in particular highly correlated matrices. Experimental results on quantized compressed sensing demonstrate that the proposed GUAMP significantly outperforms state-of-the-art GAMP and GVAMP under correlated matrices $\bf{A}$.

LGJan 6, 2023
Sample-efficient Surrogate Model for Frequency Response of Linear PDEs using Self-Attentive Complex Polynomials

Andrew Cohen, Weiping Dou, Jiang Zhu et al.

Linear Partial Differential Equations (PDEs) govern the spatial-temporal dynamics of physical systems that are essential to building modern technology. When working with linear PDEs, designing a physical system for a specific outcome is difficult and costly due to slow and expensive explicit simulation of PDEs and the highly nonlinear relationship between a system's configuration and its behavior. In this work, we prove a parametric form that certain physical quantities in the Fourier domain must obey in linear PDEs, named the CZP (Constant-Zeros-Poles) framework. Applying CZP to antenna design, an industrial application using linear PDEs (i.e., Maxwell's equations), we derive a sample-efficient parametric surrogate model that directly predicts its scattering coefficients without explicit numerical PDE simulation. Combined with a novel image-based antenna representation and an attention-based neural network architecture, CZP outperforms baselines by 10% to 25% in terms of test loss and also is able to find 2D antenna designs verifiable by commercial software with $33\%$ greater success than baselines, when coupled with sequential search techniques like reinforcement learning.

72.6NAMay 12
A novel energy-conservation Crank-Nicolson finite element method for generalized Klein-Gordon-Zakharov equations

Xuemiao Xu, Maosheng Jiang, Jiansong Zhang et al.

This article focuses on an energy-conservation Galerkin finite element method (FEM) for the generalized Klein-Gordon-Zakharov (KGZ) equations. This method combines the bilinear finite element method for spatial discretization with the Crank-Nicolson (CN) scheme for temporal discretization, thereby guaranteeing exact conservation of the discrete energy functional. A rigorous theoretical analysis is devoted to deriving error bounds for the fast-time-scale electronic field $u$ and the ion density deviation $φ$. By systematically integrating interpolation estimates, Ritz projection, and a postprocessing technique, the superclose error estimates and global superconvergence are established for $u$ in the $H^1$-norm, even under weakened regularity assumptions on the exact solution. Concurrently, we prove $H^1$-norm superconvergence for the auxiliary variable $ϕ$ ($-Δϕ= φ_t$) and optimal-order $L^2$-norm error estimates for the auxiliary variable $p$ ($p=u_t$) and $φ$. Numerical examples are provided to confirm theoretical results.

CLDec 15, 2023
LoRAMoE: Alleviate World Knowledge Forgetting in Large Language Models via MoE-Style Plugin

Shihan Dou, Enyu Zhou, Yan Liu et al.

Supervised fine-tuning (SFT) is a crucial step for large language models (LLMs), enabling them to align with human instructions and enhance their capabilities in downstream tasks. Increasing instruction data substantially is a direct solution to align the model with a broader range of downstream tasks or notably improve its performance on a specific task. However, we find that large-scale increases in instruction data can damage the world knowledge previously stored in LLMs. To address this challenge, we propose LoRAMoE, a novelty framework that introduces several low-rank adapters (LoRA) and integrates them by using a router network, like a plugin version of Mixture of Experts (MoE). It freezes the backbone model and forces a portion of LoRAs to focus on leveraging world knowledge to solve downstream tasks, to alleviate world knowledge-edge forgetting. Experimental results show that, as the instruction data increases, LoRAMoE can significantly improve the ability to process downstream tasks, while maintaining the world knowledge stored in the LLM.

CVNov 13, 2024
LG-Gaze: Learning Geometry-aware Continuous Prompts for Language-Guided Gaze Estimation

Pengwei Yin, Jingjing Wang, Guanzhong Zeng et al.

The ability of gaze estimation models to generalize is often significantly hindered by various factors unrelated to gaze, especially when the training dataset is limited. Current strategies aim to address this challenge through different domain generalization techniques, yet they have had limited success due to the risk of overfitting when solely relying on value labels for regression. Recent progress in pre-trained vision-language models has motivated us to capitalize on the abundant semantic information available. We propose a novel approach in this paper, reframing the gaze estimation task as a vision-language alignment issue. Our proposed framework, named Language-Guided Gaze Estimation (LG-Gaze), learns continuous and geometry-sensitive features for gaze estimation benefit from the rich prior knowledges of vision-language models. Specifically, LG-Gaze aligns gaze features with continuous linguistic features through our proposed multimodal contrastive regression loss, which customizes adaptive weights for different negative samples. Furthermore, to better adapt to the labels for gaze estimation task, we propose a geometry-aware interpolation method to obtain more precise gaze embeddings. Through extensive experiments, we validate the efficacy of our framework in four different cross-domain evaluation tasks.

CVDec 20, 2024
Gaze Label Alignment: Alleviating Domain Shift for Gaze Estimation

Guanzhong Zeng, Jingjing Wang, Zefu Xu et al.

Gaze estimation methods encounter significant performance deterioration when being evaluated across different domains, because of the domain gap between the testing and training data. Existing methods try to solve this issue by reducing the deviation of data distribution, however, they ignore the existence of label deviation in the data due to the acquisition mechanism of the gaze label and the individual physiological differences. In this paper, we first point out that the influence brought by the label deviation cannot be ignored, and propose a gaze label alignment algorithm (GLA) to eliminate the label distribution deviation. Specifically, we first train the feature extractor on all domains to get domain invariant features, and then select an anchor domain to train the gaze regressor. We predict the gaze label on remaining domains and use a mapping function to align the labels. Finally, these aligned labels can be used to train gaze estimation models. Therefore, our method can be combined with any existing method. Experimental results show that our GLA method can effectively alleviate the label distribution shift, and SOTA gaze estimation methods can be further improved obviously.

AIFeb 10, 2025
Unbiased Evaluation of Large Language Models from a Causal Perspective

Meilin Chen, Jian Tian, Liang Ma et al.

Benchmark contamination has become a significant concern in the LLM evaluation community. Previous Agents-as-an-Evaluator address this issue by involving agents in the generation of questions. Despite their success, the biases in Agents-as-an-Evaluator methods remain largely unexplored. In this paper, we present a theoretical formulation of evaluation bias, providing valuable insights into designing unbiased evaluation protocols. Furthermore, we identify two type of bias in Agents-as-an-Evaluator through carefully designed probing tasks on a minimal Agents-as-an-Evaluator setup. To address these issues, we propose the Unbiased Evaluator, an evaluation protocol that delivers a more comprehensive, unbiased, and interpretable assessment of LLMs.Extensive experiments reveal significant room for improvement in current LLMs. Additionally, we demonstrate that the Unbiased Evaluator not only offers strong evidence of benchmark contamination but also provides interpretable evaluation results.

CVApr 8, 2024
MOSE: Boosting Vision-based Roadside 3D Object Detection with Scene Cues

Xiahan Chen, Mingjian Chen, Sanli Tang et al.

3D object detection based on roadside cameras is an additional way for autonomous driving to alleviate the challenges of occlusion and short perception range from vehicle cameras. Previous methods for roadside 3D object detection mainly focus on modeling the depth or height of objects, neglecting the stationary of cameras and the characteristic of inter-frame consistency. In this work, we propose a novel framework, namely MOSE, for MOnocular 3D object detection with Scene cuEs. The scene cues are the frame-invariant scene-specific features, which are crucial for object localization and can be intuitively regarded as the height between the surface of the real road and the virtual ground plane. In the proposed framework, a scene cue bank is designed to aggregate scene cues from multiple frames of the same scene with a carefully designed extrinsic augmentation strategy. Then, a transformer-based decoder lifts the aggregated scene cues as well as the 3D position embeddings for 3D object location, which boosts generalization ability in heterologous scenes. The extensive experiment results on two public benchmarks demonstrate the state-of-the-art performance of the proposed method, which surpasses the existing methods by a large margin.

CVApr 24, 2025
The Fourth Monocular Depth Estimation Challenge

Anton Obukhov, Matteo Poggi, Fabio Tosi et al.

This paper presents the results of the fourth edition of the Monocular Depth Estimation Challenge (MDEC), which focuses on zero-shot generalization to the SYNS-Patches benchmark, a dataset featuring challenging environments in both natural and indoor settings. In this edition, we revised the evaluation protocol to use least-squares alignment with two degrees of freedom to support disparity and affine-invariant predictions. We also revised the baselines and included popular off-the-shelf methods: Depth Anything v2 and Marigold. The challenge received a total of 24 submissions that outperformed the baselines on the test set; 10 of these included a report describing their approach, with most leading methods relying on affine-invariant predictions. The challenge winners improved the 3D F-Score over the previous edition's best result, raising it from 22.58% to 23.05%.

SPDec 7, 2024
DM-SBL: Channel Estimation under Structured Interference

Yifan Wang, Chengjie Yu, Jiang Zhu et al.

Channel estimation is a fundamental task in communication systems and is critical for effective demodulation. While most works deal with a simple scenario where the measurements are corrupted by the additive white Gaussian noise (AWGN), this work addresses the more challenging scenario where both AWGN and structured interference coexist. Such conditions arise, for example, when a sonar/radar transmitter and a communication receiver operate simultaneously within the same bandwidth. To ensure accurate channel estimation in these scenarios, the sparsity of the channel in the delay domain and the complicate structure of the interference are jointly exploited. Firstly, the score of the structured interference is learned via a neural network based on the diffusion model (DM), while the channel prior is modeled as a Gaussian distribution, with its variance controlling channel sparsity, similar to the setup of the sparse Bayesian learning (SBL). Then, two efficient posterior sampling methods are proposed to jointly estimate the sparse channel and the interference. Nuisance parameters, such as the variance of the prior are estimated via the expectation maximization (EM) algorithm. The proposed method is termed as DM based SBL (DM-SBL). Numerical simulations demonstrate that DM-SBL significantly outperforms conventional approaches that deal with the AWGN scenario, particularly under low signal-to-interference ratio (SIR) conditions. Beyond channel estimation, DM-SBL also shows promise for addressing other linear inverse problems involving structured interference.

CVDec 16, 2023
Image Classifier Based Generative Method for Planar Antenna Design

Yang Zhong, Weiping Dou, Andrew Cohen et al.

To extend the antenna design on printed circuit boards (PCBs) for more engineers of interest, we propose a simple method that models PCB antennas with a few basic components. By taking two separate steps to decide their geometric dimensions and positions, antenna prototypes can be facilitated with no experience required. Random sampling statistics relate to the quality of dimensions are used in selecting among dimension candidates. A novel image-based classifier using a convolutional neural network (CNN) is introduced to further determine the positions of these fixed-dimension components. Two examples from wearable products have been chosen to examine the entire workflow. Their final designs are realistic and their performance metrics are not inferior to the ones designed by experienced engineers.

LGFeb 28, 2022
A Machine Learning Generative Method for Automating Antenna Design and Optimization

Yang Zhong, Peter Renner, Weiping Dou et al.

To facilitate the antenna design with the aid of computer, one of the practices in consumer electronic industry is to model and optimize antenna performances with a simplified antenna geometric scheme. Traditional antenna modeling requires profound prior knowledge of electromagnetics in order to achieve a good design which satisfies the performance specifications from both antenna and product designs. The ease of handling multidimensional optimization problems and the less dependence on domain knowledge and experience are the key to achieve the popularity of simulation driven antenna design and optimization for the industry. In this paper, we introduce a flexible geometric scheme with the concept of mesh network that can form any arbitrary shape by connecting different nodes. For such problems with high dimensional parameters, we propose a machine learning based generative method to assist the searching of optimal solutions. It consists of discriminators and generators. The discriminators are used to predict the performance of geometric models, and the generators to create new candidates that will pass the discriminators. Moreover, an evolutionary criterion approach is proposed for further improving the efficiency of our method. Finally, not only optimal solutions can be found, but also the well trained generators can be used to automate future antenna design and optimization. For a dual resonance antenna design with wide bandwidth, our proposed method is in par with Trust Region Framework and much better than the other mature machine learning algorithms including the widely used Genetic Algorithm and Particle Swarm Optimization. When there is no wide bandwidth requirement, it is better than Trust Region Framework.

IVJul 12, 2021
R3L: Connecting Deep Reinforcement Learning to Recurrent Neural Networks for Image Denoising via Residual Recovery

Rongkai Zhang, Jiang Zhu, Zhiyuan Zha et al.

State-of-the-art image denoisers exploit various types of deep neural networks via deterministic training. Alternatively, very recent works utilize deep reinforcement learning for restoring images with diverse or unknown corruptions. Though deep reinforcement learning can generate effective policy networks for operator selection or architecture search in image restoration, how it is connected to the classic deterministic training in solving inverse problems remains unclear. In this work, we propose a novel image denoising scheme via Residual Recovery using Reinforcement Learning, dubbed R3L. We show that R3L is equivalent to a deep recurrent neural network that is trained using a stochastic reward, in contrast to many popular denoisers using supervised learning with deterministic losses. To benchmark the effectiveness of reinforcement learning in R3L, we train a recurrent neural network with the same architecture for residual recovery using the deterministic loss, thus to analyze how the two different training strategies affect the denoising performance. With such a unified benchmarking system, we demonstrate that the proposed R3L has better generalizability and robustness in image denoising when the estimated noise level varies, comparing to its counterparts using deterministic training, as well as various state-of-the-art image denoising algorithms.

MMAug 10, 2016
Mining Fashion Outfit Composition Using An End-to-End Deep Learning Approach on Set Data

Yuncheng Li, LiangLiang Cao, Jiang Zhu et al.

Composing fashion outfits involves deep understanding of fashion standards while incorporating creativity for choosing multiple fashion items (e.g., Jewelry, Bag, Pants, Dress). In fashion websites, popular or high-quality fashion outfits are usually designed by fashion experts and followed by large audiences. In this paper, we propose a machine learning system to compose fashion outfits automatically. The core of the proposed automatic composition system is to score fashion outfit candidates based on the appearances and meta-data. We propose to leverage outfit popularity on fashion oriented websites to supervise the scoring component. The scoring component is a multi-modal multi-instance deep learning system that evaluates instance aesthetics and set compatibility simultaneously. In order to train and evaluate the proposed composition system, we have collected a large scale fashion outfit dataset with 195K outfits and 368K fashion items from Polyvore. Although the fashion outfit scoring and composition is rather challenging, we have achieved an AUC of 85% for the scoring component, and an accuracy of 77% for a constrained composition task.