Shixiong Zhang

7papers

182citations

Novelty36%

AI Score41

Ranked #94,524 of 201,326 authors (top 47%)#453 in AS (top 33%)

7 Papers

CVJul 18, 2023

Survey on Controlable Image Synthesis with Deep Learning

Shixiong Zhang, Jiao Li, Lu Yang

Image synthesis has attracted emerging research interests in academic and industry communities. Deep learning technologies especially the generative models greatly inspired controllable image synthesis approaches and applications, which aim to generate particular visual contents with latent prompts. In order to further investigate low-level controllable image synthesis problem which is crucial for fine image rendering and editing tasks, we present a survey of some recent works on 3D controllable image synthesis using deep learning. We first introduce the datasets and evaluation indicators for 3D controllable image synthesis. Then, we review the state-of-the-art research for geometrically controllable image synthesis in two aspects: 1) Viewpoint/pose-controllable image synthesis; 2) Structure/shape-controllable image synthesis. Furthermore, the photometrically controllable image synthesis approaches are also reviewed for 3D re-lighting researches. While the emphasis is on 3D controllable image synthesis algorithms, the related applications, products and resources are also briefly summarized for practitioners.

61.5CLApr 9

Decomposing the Delta: What Do Models Actually Learn from Preference Pairs?

Chia-Hsuan Lee, Mingyang Zhou, Renkun Ni et al.

Preference optimization methods such as DPO and KTO are widely used for aligning language models, yet little is understood about what properties of preference data drive downstream reasoning gains. We ask: what aspects of a preference pair improve a reasoning model's performance on general reasoning tasks? We investigate two distinct notions of quality delta in preference data: generator-level delta, arising from the differences in capability between models that generate chosen and rejected reasoning traces, and sample-level delta, arising from differences in judged quality differences within an individual preference pair. To study generator-level delta, we vary the generator's scale and model family, and to study sample-level delta, we employ an LLM-as-a-judge to rate the quality of generated traces along multiple reasoning-quality dimensions. We find that increasing generator-level delta steadily improves performance on out-of-domain reasoning tasks and filtering data by sample-level delta can enable more data-efficient training. Our results suggest a twofold recipe for improving reasoning performance through preference optimization: maximize generator-level delta when constructing preference pairs and exploit sample-level delta to select the most informative training examples.

CRSep 4, 2025

LADSG: Label-Anonymized Distillation and Similar Gradient Substitution for Label Privacy in Vertical Federated Learning

Zeyu Yan, Yanfei Yao, Xuanbing Wen et al.

Vertical Federated Learning (VFL) has emerged as a promising paradigm for collaborative model training across distributed feature spaces, which enables privacy-preserving learning without sharing raw data. However, recent studies have confirmed the feasibility of label inference attacks by internal adversaries. By strategically exploiting gradient vectors and semantic embeddings, attackers-through passive, active, or direct attacks-can accurately reconstruct private labels, leading to catastrophic data leakage. Existing defenses, which typically address isolated leakage vectors or are designed for specific types of attacks, remain vulnerable to emerging hybrid attacks that exploit multiple pathways simultaneously. To bridge this gap, we propose Label-Anonymized Defense with Substitution Gradient (LADSG), a unified and lightweight defense framework for VFL. LADSG first anonymizes true labels via soft distillation to reduce semantic exposure, then generates semantically-aligned substitute gradients to disrupt gradient-based leakage, and finally filters anomalous updates through gradient norm detection. It is scalable and compatible with standard VFL pipelines. Extensive experiments on six real-world datasets show that LADSG reduces the success rates of all three types of label inference attacks by 30-60% with minimal computational overhead, demonstrating its practical effectiveness.

ASJun 7, 2024

LoRA-Whisper: Parameter-Efficient and Extensible Multilingual ASR

Zheshu Song, Jianheng Zhuo, Yifan Yang et al.

Recent years have witnessed significant progress in multilingual automatic speech recognition (ASR), driven by the emergence of end-to-end (E2E) models and the scaling of multilingual datasets. Despite that, two main challenges persist in multilingual ASR: language interference and the incorporation of new languages without degrading the performance of the existing ones. This paper proposes LoRA-Whisper, which incorporates LoRA matrix into Whisper for multilingual ASR, effectively mitigating language interference. Furthermore, by leveraging LoRA and the similarities between languages, we can achieve better performance on new languages while upholding consistent performance on original ones. Experiments on a real-world task across eight languages demonstrate that our proposed LoRA-Whisper yields a relative gain of 18.5% and 23.0% over the baseline system for multilingual ASR and language expansion respectively.

ASApr 2, 2021

MetricNet: Towards Improved Modeling For Non-Intrusive Speech Quality Assessment

Meng Yu, Chunlei Zhang, Yong Xu et al.

The objective speech quality assessment is usually conducted by comparing received speech signal with its clean reference, while human beings are capable of evaluating the speech quality without any reference, such as in the mean opinion score (MOS) tests. Non-intrusive speech quality assessment has attracted much attention recently due to the lack of access to clean reference signals for objective evaluations in real scenarios. In this paper, we propose a novel non-intrusive speech quality measurement model, MetricNet, which leverages label distribution learning and joint speech reconstruction learning to achieve significantly improved performance compared to the existing non-intrusive speech quality measurement models. We demonstrate that the proposed approach yields promisingly high correlation to the intrusive objective evaluation of speech quality on clean, noisy and processed speech data.

ASNov 18, 2020

WPD++: An Improved Neural Beamformer for Simultaneous Speech Separation and Dereverberation

Zhaoheng Ni, Yong Xu, Meng Yu et al.

This paper aims at eliminating the interfering speakers' speech, additive noise, and reverberation from the noisy multi-talker speech mixture that benefits automatic speech recognition (ASR) backend. While the recently proposed Weighted Power minimization Distortionless response (WPD) beamformer can perform separation and dereverberation simultaneously, the noise cancellation component still has the potential to progress. We propose an improved neural WPD beamformer called "WPD++" by an enhanced beamforming module in the conventional WPD and a multi-objective loss function for the joint training. The beamforming module is improved by utilizing the spatio-temporal correlation. A multi-objective loss, including the complex spectra domain scale-invariant signal-to-noise ratio (C-Si-SNR) and the magnitude domain mean square error (Mag-MSE), is properly designed to make multiple constraints on the enhanced speech and the desired power of the dry clean signal. Joint training is conducted to optimize the complex-valued mask estimator and the WPD++ beamformer in an end-to-end way. The results show that the proposed WPD++ outperforms several state-of-the-art beamformers on the enhanced speech quality and word error rate (WER) of ASR.

GNJan 3, 2020

Review of Single-cell RNA-seq Data Clustering for Cell Type Identification and Characterization

Shixiong Zhang, Xiangtao Li, Qiuzhen Lin et al.

In recent years, the advances in single-cell RNA-seq techniques have enabled us to perform large-scale transcriptomic profiling at single-cell resolution in a high-throughput manner. Unsupervised learning such as data clustering has become the central component to identify and characterize novel cell types and gene expression patterns. In this study, we review the existing single-cell RNA-seq data clustering methods with critical insights into the related advantages and limitations. In addition, we also review the upstream single-cell RNA-seq data processing techniques such as quality control, normalization, and dimension reduction. We conduct performance comparison experiments to evaluate several popular single-cell RNA-seq clustering approaches on two single-cell transcriptomic datasets.