Shun Zhang

CR
h-index25
62papers
17,446citations
Novelty55%
AI Score62

62 Papers

AIJul 31, 2024
The Llama 3 Herd of Models

Aaron Grattafiori, Abhimanyu Dubey, Abhinav Jauhri et al. · allen-ai, berkeley

Modern artificial intelligence (AI) systems are powered by foundation models. This paper presents a new set of foundation models, called Llama 3. It is a herd of language models that natively support multilinguality, coding, reasoning, and tool usage. Our largest model is a dense Transformer with 405B parameters and a context window of up to 128K tokens. This paper presents an extensive empirical evaluation of Llama 3. We find that Llama 3 delivers comparable quality to leading language models such as GPT-4 on a plethora of tasks. We publicly release Llama 3, including pre-trained and post-trained versions of the 405B parameter language model and our Llama Guard 3 model for input and output safety. The paper also presents the results of experiments in which we integrate image, video, and speech capabilities into Llama 3 via a compositional approach. We observe this approach performs competitively with the state-of-the-art on image, video, and speech recognition tasks. The resulting models are not yet being broadly released as they are still under development.

99.1CVJun 1Code
Cosmos 3: Omnimodal World Models for Physical AI

Aditi, Niket Agarwal, Arslan Ali et al.

We introduce Cosmos 3, a family of omnimodal world models designed to jointly process and generate language, image, video, audio, and action sequences within a unified mixture-of-transformers architecture. By supporting highly flexible input-output configurations, Cosmos 3 seamlessly unifies critical modalities for Physical AI -- effectively subsuming vision-language models, video generators, world simulators, and world-action models into a single framework. Our evaluation demonstrates that Cosmos 3 establishes a new state-of-the-art across a diverse suite of understanding and generation tasks, demonstrating omnimodal world models as scalable, general-purpose backbones for embodied agents. Our post-trained Cosmos 3 models were ranked as the best open-source Text-to-Image and Image-to-Video models by Artificial Analysis, and the best policy model by RoboArena at the time the technical report was written. To accelerate open research and deployment in Physical AI, we make our code, model checkpoints, curated synthetic datasets, and evaluation benchmark available under the Linux Foundation's OpenMDW-1.1 https://openmdw.ai/license/1-1/ License at https://github.com/nvidia/cosmos}{github.com/nvidia/cosmos and https://huggingface.co/collections/nvidia/cosmos3 . The project website is available at https://research.nvidia.com/labs/cosmos-lab/cosmos3 .

CVOct 30, 2022Code
Benchmarking Adversarial Patch Against Aerial Detection

Jiawei Lian, Shaohui Mei, Shun Zhang et al.

DNNs are vulnerable to adversarial examples, which poses great security concerns for security-critical systems. In this paper, a novel adaptive-patch-based physical attack (AP-PA) framework is proposed, which aims to generate adversarial patches that are adaptive in both physical dynamics and varying scales, and by which the particular targets can be hidden from being detected. Furthermore, the adversarial patch is also gifted with attack effectiveness against all targets of the same class with a patch outside the target (No need to smear targeted objects) and robust enough in the physical world. In addition, a new loss is devised to consider more available information of detected objects to optimize the adversarial patch, which can significantly improve the patch's attack efficacy (Average precision drop up to 87.86% and 85.48% in white-box and black-box settings, respectively) and optimizing efficiency. We also establish one of the first comprehensive, coherent, and rigorous benchmarks to evaluate the attack efficacy of adversarial patches on aerial detection tasks. Finally, several proportionally scaled experiments are performed physically to demonstrate that the elaborated adversarial patches can successfully deceive aerial detection algorithms in dynamic physical circumstances. The code is available at https://github.com/JiaweiLian/AP-PA.

LGMar 9, 2023
Planning with Large Language Models for Code Generation

Shun Zhang, Zhenfang Chen, Yikang Shen et al.

Existing large language model-based code generation pipelines typically use beam search or sampling algorithms during the decoding process. Although the programs they generate achieve high token-matching-based scores, they often fail to compile or generate incorrect outputs. The main reason is that conventional Transformer decoding algorithms may not be the best choice for code generation. In this work, we propose a novel Transformer decoding algorithm, Planning-Guided Transformer Decoding (PG-TD), that uses a planning algorithm to do lookahead search and guide the Transformer to generate better programs. Specifically, instead of simply optimizing the likelihood of the generated sequences, the Transformer makes use of a planner to generate candidate programs and test them on public test cases. The Transformer can therefore make more informed decisions and generate tokens that will eventually lead to higher-quality programs. We also design a mechanism that shares information between the Transformer and the planner to make our algorithm computationally efficient. We empirically evaluate our framework with several large language models as backbones on public coding challenge benchmarks, showing that 1) it can generate programs that consistently achieve higher performance compared with competing baseline methods; 2) it enables controllable code generation, such as concise codes and highly-commented codes by optimizing modified objective.

LGJun 27, 2022
Prompting Decision Transformer for Few-Shot Policy Generalization

Mengdi Xu, Yikang Shen, Shun Zhang et al.

Humans can leverage prior experience and learn novel tasks from a handful of demonstrations. In contrast to offline meta-reinforcement learning, which aims to achieve quick adaptation through better algorithm design, we investigate the effect of architecture inductive bias on the few-shot learning capability. We propose a Prompt-based Decision Transformer (Prompt-DT), which leverages the sequential modeling ability of the Transformer architecture and the prompt framework to achieve few-shot adaptation in offline RL. We design the trajectory prompt, which contains segments of the few-shot demonstrations, and encodes task-specific information to guide policy generation. Our experiments in five MuJoCo control benchmarks show that Prompt-DT is a strong few-shot learner without any extra finetuning on unseen target tasks. Prompt-DT outperforms its variants and strong meta offline RL baselines by a large margin with a trajectory prompt containing only a few timesteps. Prompt-DT is also robust to prompt length changes and can generalize to out-of-distribution (OOD) environments.

NAJul 17, 2014
Div First-Order System LL* (FOSLL*) for Second-Order Elliptic Partial Differential Equations

Zhiqiang Cai, Rob Falgout, Shun Zhang

The first-order system LL* (FOSLL*) approach for general second-order elliptic partial differential equations was proposed and analyzed in [10], in order to retain the full efficiency of the L2 norm first-order system least-squares (FOSLS) ap- proach while exhibiting the generality of the inverse-norm FOSLS approach. The FOSLL* approach in [10] was applied to the div-curl system with added slack vari- ables, and hence it is quite complicated. In this paper, we apply the FOSLL* approach to the div system and establish its well-posedness. For the corresponding finite ele- ment approximation, we obtain a quasi-optimal a priori error bound under the same regularity assumption as the standard Galerkin method, but without the restriction to sufficiently small mesh size. Unlike the FOSLS approach, the FOSLL* approach does not have a free a posteriori error estimator, we then propose an explicit residual error estimator and establish its reliability and efficiency bounds

LGApr 17, 2023
Hyper-Decision Transformer for Efficient Online Policy Adaptation

Mengdi Xu, Yuchen Lu, Yikang Shen et al.

Decision Transformers (DT) have demonstrated strong performances in offline reinforcement learning settings, but quickly adapting to unseen novel tasks remains challenging. To address this challenge, we propose a new framework, called Hyper-Decision Transformer (HDT), that can generalize to novel tasks from a handful of demonstrations in a data- and parameter-efficient manner. To achieve such a goal, we propose to augment the base DT with an adaptation module, whose parameters are initialized by a hyper-network. When encountering unseen tasks, the hyper-network takes a handful of demonstrations as inputs and initializes the adaptation module accordingly. This initialization enables HDT to efficiently adapt to novel tasks by only fine-tuning the adaptation module. We validate HDT's generalization capability on object manipulation tasks. We find that with a single expert demonstration and fine-tuning only 0.5% of DT parameters, HDT adapts faster to unseen tasks than fine-tuning the whole DT model. Finally, we explore a more challenging setting where expert actions are not available, and we show that HDT outperforms state-of-the-art baselines in terms of task success rates by a large margin.

NAApr 23, 2016
Improved ZZ A Posteriori Error Estimators for Diffusion Problems: Conforming Linear Elements

Zhiqiang Cai, Cuiyu He, Shun Zhang

In \cite{CaZh:09}, we introduced and analyzed an improved Zienkiewicz-Zhu (ZZ) estimator for the conforming linear finite element approximation to elliptic interface problems. The estimator is based on the piecewise "constant" flux recovery in the $H(div;Ω)$ conforming finite element space. This paper extends the results of \cite{CaZh:09} to diffusion problems with full diffusion tensor and to the flux recovery both in piecewise constant and piecewise linear $H(div)$ space.

NAMar 3, 2016
Residual-based a Posteriori Error Estimate for Interface Problems: Nonconforming Linear Elements

Zhiqiang Cai, Cuiyu He, Shun Zhang

In this paper, we study a modified residual-based a posteriori error estimator for the nonconforming linear finite element approximation to the interface problem. The reliability of the estimator is analyzed by a new and direct approach without using the Helmholtz decomposition. It is proved that the estimator is reliable with constant independent of the jump of diffusion coefficients across the interfaces, without the assumption that the diffusion coefficient is quasi-monotone. Numerical results for one test problem with intersecting interfaces are also presented.

CVNov 30, 2025Code
CycliST: A Video Language Model Benchmark for Reasoning on Cyclical State Transitions

Simon Kohaut, Daniel Ochs, Shun Zhang et al.

We present CycliST, a novel benchmark dataset designed to evaluate Video Language Models (VLM) on their ability for textual reasoning over cyclical state transitions. CycliST captures fundamental aspects of real-world processes by generating synthetic, richly structured video sequences featuring periodic patterns in object motion and visual attributes. CycliST employs a tiered evaluation system that progressively increases difficulty through variations in the number of cyclic objects, scene clutter, and lighting conditions, challenging state-of-the-art models on their spatio-temporal cognition. We conduct extensive experiments with current state-of-the-art VLMs, both open-source and proprietary, and reveal their limitations in generalizing to cyclical dynamics such as linear and orbital motion, as well as time-dependent changes in visual attributes like color and scale. Our results demonstrate that present-day VLMs struggle to reliably detect and exploit cyclic patterns, lack a notion of temporal understanding, and are unable to extract quantitative insights from scenes, such as the number of objects in motion, highlighting a significant technical gap that needs to be addressed. More specifically, we find no single model consistently leads in performance: neither size nor architecture correlates strongly with outcomes, and no model succeeds equally well across all tasks. By providing a targeted challenge and a comprehensive evaluation framework, CycliST paves the way for visual reasoning models that surpass the state-of-the-art in understanding periodic patterns.

NANov 18, 2015
Complexity of Oscillatory Integrals on the Real Line

Erich Novak, Mario Ullrich, Henryk Woźniakowski et al.

We analyze univariate oscillatory integrals defined on the real line for functions from the standard Sobolev space $H^s({\mathbb{R}})$ and from the space $C^s({\mathbb{R}})$ with an arbitrary integer $s\ge1$. We find tight upper and lower bounds for the worst case error of optimal algorithms that use $n$ function values. More specifically, we study integrals of the form \[ I_k^ρ(f) = \int_{ {\mathbb{R}}} f(x) \,e^{-i\,kx} ρ(x) \, {\rm d} x\ \ \ \mbox{for}\ \ f\in H^s({\mathbb{R}})\ \ \mbox{or}\ \ f\in C^s({\mathbb{R}}) \] with $k\in {\mathbb{R}}$ and a smooth density function $ρ$ such as $ ρ(x) = \frac{1}{\sqrt{2 π}} \exp( -x^2/2) $. The optimal error bounds are $Θ((n+\max(1,|k|))^{-s})$ with the factors in the $Θ$ notation dependent only on $s$ and $ρ$.

NAMar 3, 2016
Finite Element Methods for Interface Problems: Robust Residual-Based A Posteriori Error Estimates

Zhiqiang Cai, Cuiyu He, Shun Zhang

For elliptic interface problems, this paper studies residual-based a posteriori error estimations for various finite element approximations. For the conforming and the Raviart-Thomas mixed elements in two-dimension and for the Crouzeix-Raviart nonconforming and the discontinuous Galerkin elements in both two- and three-dimensions, the global reliability bounds are established with constants independent of the jump of the diffusion coefficient. Moreover, we obtain these estimates with no assumption on the distribution of the diffusion coefficient.

NAFeb 28, 2019
Robust and Local Optimal A Priori Error Estimates for Interface Problems with Low Regularity: Mixed Finite Element Approximations

Shun Zhang

For elliptic interface problems in two- and three-dimensions with a possible very low regularity, this paper establishes a priori error estimates for the Raviart-Thomas and Brezzi-Douglas-Marini mixed finite element approximations. These estimates are robust with respect to the diffusion coefficient and optimal with respect to the local regularity of the solution. Several versions of the robust best approximations of the flux and the potential approximations are obtained. These robust and local optimal a priori estimates provide guidance for constructing robust a posteriori error estimates and adaptive methods for the mixed approximations.

NAJul 16, 2014
Recovery-Based Error Estimators for Diffusion Problems: Explicit Formulas

Zhiqiang Cai, Shun Zhang

We introduced and analyzed robust recovery-based a posteriori error estimators for various lower order finite element approximations to interface problems in [9, 10], where the recoveries of the flux and/or gradient are implicit (i.e., requiring solutions of global problems with mass matrices). In this paper, we develop fully explicit recovery-based error estimators for lower order conforming, mixed, and non- conforming finite element approximations to diffusion problems with full coefficient tensor. When the diffusion coefficient is piecewise constant scalar and its distribution is local quasi-monotone, it is shown theoretically that the estimators developed in this paper are robust with respect to the size of jumps. Numerical experiments are also performed to support the theoretical results.

ARJul 19, 2024
LaMAGIC: Language-Model-based Topology Generation for Analog Integrated Circuits

Chen-Chia Chang, Yikang Shen, Shaoze Fan et al.

In the realm of electronic and electrical engineering, automation of analog circuit is increasingly vital given the complexity and customized requirements of modern applications. However, existing methods only develop search-based algorithms that require many simulation iterations to design a custom circuit topology, which is usually a time-consuming process. To this end, we introduce LaMAGIC, a pioneering language model-based topology generation model that leverages supervised finetuning for automated analog circuit design. LaMAGIC can efficiently generate an optimized circuit design from the custom specification in a single pass. Our approach involves a meticulous development and analysis of various input and output formulations for circuit. These formulations can ensure canonical representations of circuits and align with the autoregressive nature of LMs to effectively addressing the challenges of representing analog circuits as graphs. The experimental results show that LaMAGIC achieves a success rate of up to 96\% under a strict tolerance of 0.01. We also examine the scalability and adaptability of LaMAGIC, specifically testing its performance on more complex circuits. Our findings reveal the enhanced effectiveness of our adjacency matrix-based circuit formulation with floating-point input, suggesting its suitability for handling intricate circuit designs. This research not only demonstrates the potential of language models in graph generation, but also builds a foundational framework for future explorations in automated analog circuit design.

CLSep 3, 2024
FuzzCoder: Byte-level Fuzzing Test via Large Language Model

Liqun Yang, Jian Yang, Chaoren Wei et al.

Fuzzing is an important dynamic program analysis technique designed for finding vulnerabilities in complex software. Fuzzing involves presenting a target program with crafted malicious input to cause crashes, buffer overflows, memory errors, and exceptions. Crafting malicious inputs in an efficient manner is a difficult open problem and the best approaches often apply uniform random mutations to pre-existing valid inputs. In this work, we propose to adopt fine-tuned large language models (FuzzCoder) to learn patterns in the input files from successful attacks to guide future fuzzing explorations. Specifically, we develop a framework to leverage the code LLMs to guide the mutation process of inputs in fuzzing. The mutation process is formulated as the sequence-to-sequence modeling, where LLM receives a sequence of bytes and then outputs the mutated byte sequence. FuzzCoder is fine-tuned on the created instruction dataset (Fuzz-Instruct), where the successful fuzzing history is collected from the heuristic fuzzing tool. FuzzCoder can predict mutation locations and strategies locations in input files to trigger abnormal behaviors of the program. Experimental results show that FuzzCoder based on AFL (American Fuzzy Lop) gain significant improvements in terms of effective proportion of mutation (EPM) and number of crashes (NC) for various input formats including ELF, JPG, MP3, and XML.

CLOct 8, 2022
KG-MTT-BERT: Knowledge Graph Enhanced BERT for Multi-Type Medical Text Classification

Yong He, Cheng Wang, Shun Zhang et al.

Medical text learning has recently emerged as a promising area to improve healthcare due to the wide adoption of electronic health record (EHR) systems. The complexity of the medical text such as diverse length, mixed text types, and full of medical jargon, poses a great challenge for developing effective deep learning models. BERT has presented state-of-the-art results in many NLP tasks, such as text classification and question answering. However, the standalone BERT model cannot deal with the complexity of the medical text, especially the lengthy clinical notes. Herein, we develop a new model called KG-MTT-BERT (Knowledge Graph Enhanced Multi-Type Text BERT) by extending the BERT model for long and multi-type text with the integration of the medical knowledge graph. Our model can outperform all baselines and other state-of-the-art models in diagnosis-related group (DRG) classification, which requires comprehensive medical text for accurate classification. We also demonstrated that our model can effectively handle multi-type text and the integration of medical knowledge graph can significantly improve the performance.

FAJul 4, 2012
Gelfand and Kolmogorov numbers of Sobolev embeddings of weighted function spaces II

Shun Zhang, Gensun Fang, Fanglun Huang

We consider the Gelfand and Kolmogorov numbers of compact embeddings between weighted function spaces of Besov and Triebel-Lizorkin type with polynomial weights in the non-limiting case. Our main purpose here is to complement our previous results in \cite{ZF10} in the context of the quasi-Banach setting, $0 < p, q \le \infty$. In addition, sharp estimates for their approximation numbers in several cases left open in Skrzypczak (JAT, 2005) are provided.

98.6CRApr 15
SafeHarness: Lifecycle-Integrated Security Architecture for LLM-based Agent Deployment

Xixun Lin, Yang Liu, Yancheng Chen et al.

The performance of large language model (LLM) agents depends critically on the execution harness, the system layer that orchestrates tool use, context management, and state persistence. Yet this same architectural centrality makes the harness a high-value attack surface: a single compromise at the harness level can cascade through the entire execution pipeline. We observe that existing security approaches suffer from structural mismatch, leaving them blind to harness-internal state and unable to coordinate across the different phases of agent operation. In this paper, we introduce \safeharness{}, a security architecture in which four proposed defense layers are woven directly into the agent lifecycle to address above significant limitations: adversarial context filtering at input processing, tiered causal verification at decision making, privilege-separated tool control at action execution, and safe rollback with adaptive degradation at state update. The proposed cross-layer mechanisms tie these layers together, escalating verification rigor, triggering rollbacks, and tightening tool privileges whenever sustained anomalies are detected. We evaluate \safeharness{} on benchmark datasets across diverse harness configurations, comparing against four security baselines under five attack scenarios spanning six threat categories. Compared to the unprotected baseline, \safeharness{} achieves an average reduction of approximately 38\% in UBR and 42\% in ASR, substantially lowering both the unsafe behavior rate and the attack success rate while preserving core task utility.

LGSep 15, 2024
Enhancing Data Quality through Self-learning on Imbalanced Financial Risk Data

Xu Sun, Zixuan Qin, Shun Zhang et al.

In the financial risk domain, particularly in credit default prediction and fraud detection, accurate identification of high-risk class instances is paramount, as their occurrence can have significant economic implications. Although machine learning models have gained widespread adoption for risk prediction, their performance is often hindered by the scarcity and diversity of high-quality data. This limitation stems from factors in datasets such as small risk sample sizes, high labeling costs, and severe class imbalance, which impede the models' ability to learn effectively and accurately forecast critical events. This study investigates data pre-processing techniques to enhance existing financial risk datasets by introducing TriEnhance, a straightforward technique that entails: (1) generating synthetic samples specifically tailored to the minority class, (2) filtering using binary feedback to refine samples, and (3) self-learning with pseudo-labels. Our experiments across six benchmark datasets reveal the efficacy of TriEnhance, with a notable focus on improving minority class calibration, a key factor for developing more robust financial risk prediction systems.

SDMay 22, 2025Code
AudioTrust: Benchmarking the Multifaceted Trustworthiness of Audio Large Language Models

Kai Li, Can Shen, Yile Liu et al.

Audio Large Language Models (ALLMs) have gained widespread adoption, yet their trustworthiness remains underexplored. Existing evaluation frameworks, designed primarily for text, fail to address unique vulnerabilities introduced by audio's acoustic properties. We identify significant trustworthiness risks in ALLMs arising from non-semantic acoustic cues, including timbre, accent, and background noise, which can manipulate model behavior. We propose AudioTrust, a comprehensive framework for systematic evaluation of ALLM trustworthiness across audio-specific risks. AudioTrust encompasses six key dimensions: fairness, hallucination, safety, privacy, robustness, and authentication. The framework implements 26 distinct sub-tasks using a curated dataset of over 4,420 audio samples from real-world scenarios, including daily conversations, emergency calls, and voice assistant interactions. We conduct comprehensive evaluations across 18 experimental configurations using human-validated automated pipelines. Our evaluation of 14 state-of-the-art open-source and closed-source ALLMs reveals significant limitations when confronted with diverse high-risk audio scenarios, providing insights for secure deployment of audio models. Code and data are available at https://github.com/JusperLee/AudioTrust.

CVDec 28, 2025
Toward Stable Semi-Supervised Remote Sensing Segmentation via Co-Guidance and Co-Fusion

Yi Zhou, Xuechao Zou, Shun Zhang et al.

Semi-supervised remote sensing (RS) image semantic segmentation offers a promising solution to alleviate the burden of exhaustive annotation, yet it fundamentally struggles with pseudo-label drift, a phenomenon where confirmation bias leads to the accumulation of errors during training. In this work, we propose Co2S, a stable semi-supervised RS segmentation framework that synergistically fuses priors from vision-language models and self-supervised models. Specifically, we construct a heterogeneous dual-student architecture comprising two distinct ViT-based vision foundation models initialized with pretrained CLIP and DINOv3 to mitigate error accumulation and pseudo-label drift. To effectively incorporate these distinct priors, an explicit-implicit semantic co-guidance mechanism is introduced that utilizes text embeddings and learnable queries to provide explicit and implicit class-level guidance, respectively, thereby jointly enhancing semantic consistency. Furthermore, a global-local feature collaborative fusion strategy is developed to effectively fuse the global contextual information captured by CLIP with the local details produced by DINOv3, enabling the model to generate highly precise segmentation results. Extensive experiments on six popular datasets demonstrate the superiority of the proposed method, which consistently achieves leading performance across various partition protocols and diverse scenarios. Project page is available at https://xavierjiezou.github.io/Co2S/.

CVSep 23, 2023
M$^3$CS: Multi-Target Masked Point Modeling with Learnable Codebook and Siamese Decoders

Qibo Qiu, Honghui Yang, Wenxiao Wang et al.

Masked point modeling has become a promising scheme of self-supervised pre-training for point clouds. Existing methods reconstruct either the original points or related features as the objective of pre-training. However, considering the diversity of downstream tasks, it is necessary for the model to have both low- and high-level representation modeling capabilities to capture geometric details and semantic contexts during pre-training. To this end, M$^3$CS is proposed to enable the model with the above abilities. Specifically, with masked point cloud as input, M$^3$CS introduces two decoders to predict masked representations and the original points simultaneously. While an extra decoder doubles parameters for the decoding process and may lead to overfitting, we propose siamese decoders to keep the amount of learnable parameters unchanged. Further, we propose an online codebook projecting continuous tokens into discrete ones before reconstructing masked points. In such way, we can enforce the decoder to take effect through the combinations of tokens rather than remembering each token. Comprehensive experiments show that M$^3$CS achieves superior performance at both classification and segmentation tasks, outperforming existing methods.

CVMar 9, 2025Code
Dynamic Dictionary Learning for Remote Sensing Image Segmentation

Xuechao Zou, Yue Li, Shun Zhang et al.

Remote sensing image segmentation faces persistent challenges in distinguishing morphologically similar categories and adapting to diverse scene variations. While existing methods rely on implicit representation learning paradigms, they often fail to dynamically adjust semantic embeddings according to contextual cues, leading to suboptimal performance in fine-grained scenarios such as cloud thickness differentiation. This work introduces a dynamic dictionary learning framework that explicitly models class ID embeddings through iterative refinement. The core contribution lies in a novel dictionary construction mechanism, where class-aware semantic embeddings are progressively updated via multi-stage alternating cross-attention querying between image features and dictionary embeddings. This process enables adaptive representation learning tailored to input-specific characteristics, effectively resolving ambiguities in intra-class heterogeneity and inter-class homogeneity. To further enhance discriminability, a contrastive constraint is applied to the dictionary space, ensuring compact intra-class distributions while maximizing inter-class separability. Extensive experiments across both coarse- and fine-grained datasets demonstrate consistent improvements over state-of-the-art methods, particularly in two online test benchmarks (LoveDA and UAVid). Code is available at https://anonymous.4open.science/r/D2LS-8267/.

42.8LGMay 11
NoRIN: Backbone-Adaptive Reversible Normalization for Time-Series Forecasting

Shun Zhang, Yuyang Xiao

Reversible instance normalization (RevIN) and its successors (Dish-TS, SAN, FAN) have become the de facto plug-in for time-series forecasting, yet the map they apply to each data point is strictly affine, $x \mapsto ax+b$, so they cannot reshape the underlying distribution -- heavy tails remain heavy and skewness remains uncorrected. We propose NoRIN, a non-linear reversible normalization based on the arcsinh-form Johnson $S_U$ transform with two shape parameters $(δ,\varepsilon)$ that control tailedness and skewness; the linear $Z$-score used by RevIN is recovered only in the limit $δ\to \infty$. Training $(δ,\varepsilon)$ jointly with the backbone via gradient descent reliably pushes them toward this linear limit within a few epochs -- a phenomenon we name the degeneration problem: the forecasting loss is locally indifferent to shape, and the high-capacity backbone compensates for any monotone reparameterization of its input. NoRIN escapes the degeneration by decoupling shape selection from gradient training: $(δ,\varepsilon)$ are initialized by a closed-form Slifker-Shapiro quantile fit and refined by Bayesian optimization on the validation objective, while the inner training loop is identical to standard RevIN-style training. Across six representative backbones x five real-world datasets x three prediction horizons (90 configurations), decoupled shape optimization recovers $(δ^\star,\varepsilon^\star)$ that sit systematically far from the linear limit, with values that vary in a backbone-dependent way. This empirically supports the central thesis: different backbones genuinely require different normalization parameters to reach their best performance.

CVAug 30, 2025Code
Mixture of Global and Local Experts with Diffusion Transformer for Controllable Face Generation

Xuechao Zou, Shun Zhang, Xing Fu et al.

Controllable face generation poses critical challenges in generative modeling due to the intricate balance required between semantic controllability and photorealism. While existing approaches struggle with disentangling semantic controls from generation pipelines, we revisit the architectural potential of Diffusion Transformers (DiTs) through the lens of expert specialization. This paper introduces Face-MoGLE, a novel framework featuring: (1) Semantic-decoupled latent modeling through mask-conditioned space factorization, enabling precise attribute manipulation; (2) A mixture of global and local experts that captures holistic structure and region-level semantics for fine-grained controllability; (3) A dynamic gating network producing time-dependent coefficients that evolve with diffusion steps and spatial locations. Face-MoGLE provides a powerful and flexible solution for high-quality, controllable face generation, with strong potential in generative modeling and security applications. Extensive experiments demonstrate its effectiveness in multimodal and monomodal face generation settings and its robust zero-shot generalization capability. Project page is available at https://github.com/XavierJiezou/Face-MoGLE.

CLJun 5, 2024Code
Towards Real-world Scenario: Imbalanced New Intent Discovery

Shun Zhang, Chaoran Yan, Jian Yang et al.

New Intent Discovery (NID) aims at detecting known and previously undefined categories of user intent by utilizing limited labeled and massive unlabeled data. Most prior works often operate under the unrealistic assumption that the distribution of both familiar and new intent classes is uniform, overlooking the skewed and long-tailed distributions frequently encountered in real-world scenarios. To bridge the gap, our work introduces the imbalanced new intent discovery (i-NID) task, which seeks to identify familiar and novel intent categories within long-tailed distributions. A new benchmark (ImbaNID-Bench) comprised of three datasets is created to simulate the real-world long-tail distributions. ImbaNID-Bench ranges from broad cross-domain to specific single-domain intent categories, providing a thorough representation of practical use cases. Besides, a robust baseline model ImbaNID is proposed to achieve cluster-friendly intent representations. It includes three stages: model pre-training, generation of reliable pseudo-labels, and robust representation learning that strengthens the model performance to handle the intricacies of real-world data distributions. Our extensive experiments on previous benchmarks and the newly established benchmark demonstrate the superior performance of ImbaNID in addressing the i-NID task, highlighting its potential as a powerful baseline for uncovering and categorizing user intents in imbalanced and long-tailed distributions\footnote{\url{https://github.com/Zkdc/i-NID}}.

IVFeb 27, 2020Code
RSANet: Recurrent Slice-wise Attention Network for Multiple Sclerosis Lesion Segmentation

Hang Zhang, Jinwei Zhang, Qihao Zhang et al.

Brain lesion volume measured on T2 weighted MRI images is a clinically important disease marker in multiple sclerosis (MS). Manual delineation of MS lesions is a time-consuming and highly operator-dependent task, which is influenced by lesion size, shape and conspicuity. Recently, automated lesion segmentation algorithms based on deep neural networks have been developed with promising results. In this paper, we propose a novel recurrent slice-wise attention network (RSANet), which models 3D MRI images as sequences of slices and captures long-range dependencies through a recurrent manner to utilize contextual information of MS lesions. Experiments on a dataset with 43 patients show that the proposed method outperforms the state-of-the-art approaches. Our implementation is available online at https://github.com/tinymilky/RSANet.

LGJan 30, 2024
Improving Reinforcement Learning from Human Feedback with Efficient Reward Model Ensemble

Shun Zhang, Zhenfang Chen, Sunli Chen et al.

Reinforcement Learning from Human Feedback (RLHF) is a widely adopted approach for aligning large language models with human values. However, RLHF relies on a reward model that is trained with a limited amount of human preference data, which could lead to inaccurate predictions. As a result, RLHF may produce outputs that are misaligned with human values. To mitigate this issue, we contribute a reward ensemble method that allows the reward model to make more accurate predictions. As using an ensemble of large language model-based reward models can be computationally and resource-expensive, we explore efficient ensemble methods including linear-layer ensemble and LoRA-based ensemble. Empirically, we run Best-of-$n$ and Proximal Policy Optimization with our ensembled reward models, and verify that our ensemble methods help improve the alignment performance of RLHF outputs.

CLFeb 17, 2024
C-ICL: Contrastive In-context Learning for Information Extraction

Ying Mo, Jiahao Liu, Jian Yang et al.

There has been increasing interest in exploring the capabilities of advanced large language models (LLMs) in the field of information extraction (IE), specifically focusing on tasks related to named entity recognition (NER) and relation extraction (RE). Although researchers are exploring the use of few-shot information extraction through in-context learning with LLMs, they tend to focus only on using correct or positive examples for demonstration, neglecting the potential value of incorporating incorrect or negative examples into the learning process. In this paper, we present c-ICL, a novel few-shot technique that leverages both correct and incorrect sample constructions to create in-context learning demonstrations. This approach enhances the ability of LLMs to extract entities and relations by utilizing prompts that incorporate not only the positive samples but also the reasoning behind them. This method allows for the identification and correction of potential interface errors. Specifically, our proposed method taps into the inherent contextual information and valuable information in hard negative samples and the nearest positive neighbors to the test and then applies the in-context learning demonstrations based on LLMs. Our experiments on various datasets indicate that c-ICL outperforms previous few-shot in-context learning methods, delivering substantial enhancements in performance across a broad spectrum of related tasks. These improvements are noteworthy, showcasing the versatility of our approach in miscellaneous scenarios.

CLMar 25, 2024
New Intent Discovery with Attracting and Dispersing Prototype

Shun Zhang, Jian Yang, Jiaqi Bai et al.

New Intent Discovery (NID) aims to recognize known and infer new intent categories with the help of limited labeled and large-scale unlabeled data. The task is addressed as a feature-clustering problem and recent studies augment instance representation. However, existing methods fail to capture cluster-friendly representations, since they show less capability to effectively control and coordinate within-cluster and between-cluster distances. Tailored to the NID problem, we propose a Robust and Adaptive Prototypical learning (RAP) framework for globally distinct decision boundaries for both known and new intent categories. Specifically, a robust prototypical attracting learning (RPAL) method is designed to compel instances to gravitate toward their corresponding prototype, achieving greater within-cluster compactness. To attain larger between-cluster separation, another adaptive prototypical dispersing learning (APDL) method is devised to maximize the between-cluster distance from the prototype-to-prototype perspective. Experimental results evaluated on three challenging benchmarks (CLINC, BANKING, and StackOverflow) of our method with better cluster-friendly representation demonstrate that RAP brings in substantial improvements over the current state-of-the-art methods (even large language model) by a large margin (average +5.5% improvement).

CLOct 27, 2024
Get Large Language Models Ready to Speak: A Late-fusion Approach for Speech Generation

Maohao Shen, Shun Zhang, Jilong Wu et al.

Large language models (LLMs) have revolutionized natural language processing (NLP) with impressive performance across various text-based tasks. However, the extension of text-dominant LLMs to with speech generation tasks remains under-explored. In this work, we introduce a text-to-speech (TTS) system powered by a fine-tuned Llama model, named TTS-Llama, that achieves state-of-the-art speech synthesis performance. Building on TTS-Llama, we further propose MoLE-Llama, a text-and-speech multimodal LLM developed through purely late-fusion parameter-efficient fine-tuning (PEFT) and a mixture-of-expert architecture. Extensive empirical results demonstrate MoLE-Llama's competitive performance on both text-only question-answering (QA) and TTS tasks, mitigating catastrophic forgetting issue in either modality. Finally, we further explore MoLE-Llama in text-in-speech-out QA tasks, demonstrating its great potential as a multimodal dialog system capable of speech generation.

CVNov 20, 2024
Adapting Vision Foundation Models for Robust Cloud Segmentation in Remote Sensing Images

Xuechao Zou, Shun Zhang, Kai Li et al.

Cloud segmentation is a critical challenge in remote sensing image interpretation, as its accuracy directly impacts the effectiveness of subsequent data processing and analysis. Recently, vision foundation models (VFM) have demonstrated powerful generalization capabilities across various visual tasks. In this paper, we present a parameter-efficient adaptive approach, termed Cloud-Adapter, designed to enhance the accuracy and robustness of cloud segmentation. Our method leverages a VFM pretrained on general domain data, which remains frozen, eliminating the need for additional training. Cloud-Adapter incorporates a lightweight spatial perception module that initially utilizes a convolutional neural network (ConvNet) to extract dense spatial representations. These multi-scale features are then aggregated and serve as contextual inputs to an adapting module, which modulates the frozen transformer layers within the VFM. Experimental results demonstrate that the Cloud-Adapter approach, utilizing only 0.6% of the trainable parameters of the frozen backbone, achieves substantial performance gains. Cloud-Adapter consistently achieves state-of-the-art performance across various cloud segmentation datasets from multiple satellite sources, sensor series, data processing levels, land cover scenarios, and annotation granularities. We have released the code and model checkpoints at https://xavierjiezou.github.io/Cloud-Adapter/ to support further research.

CLJan 12, 2024
Multi-Task Learning for Front-End Text Processing in TTS

Wonjune Kang, Yun Wang, Shun Zhang et al. · mit

We propose a multi-task learning (MTL) model for jointly performing three tasks that are commonly solved in a text-to-speech (TTS) front-end: text normalization (TN), part-of-speech (POS) tagging, and homograph disambiguation (HD). Our framework utilizes a tree-like structure with a trunk that learns shared representations, followed by separate task-specific heads. We further incorporate a pre-trained language model to utilize its built-in lexical and contextual knowledge, and study how to best use its embeddings so as to most effectively benefit our multi-task model. Through task-wise ablations, we show that our full model trained on all three tasks achieves the strongest overall performance compared to models trained on individual or sub-combinations of tasks, confirming the advantages of our MTL framework. Finally, we introduce a new HD dataset containing a balanced number of sentences in diverse contexts for a variety of homographs and their pronunciations. We demonstrate that incorporating this dataset into training significantly improves HD performance over only using a commonly used, but imbalanced, pre-existing dataset.

CLApr 13, 2024
RoNID: New Intent Discovery with Generated-Reliable Labels and Cluster-friendly Representations

Shun Zhang, Chaoran Yan, Jian Yang et al.

New Intent Discovery (NID) strives to identify known and reasonably deduce novel intent groups in the open-world scenario. But current methods face issues with inaccurate pseudo-labels and poor representation learning, creating a negative feedback loop that degrades overall model performance, including accuracy and the adjusted rand index. To address the aforementioned challenges, we propose a Robust New Intent Discovery (RoNID) framework optimized by an EM-style method, which focuses on constructing reliable pseudo-labels and obtaining cluster-friendly discriminative representations. RoNID comprises two main modules: reliable pseudo-label generation module and cluster-friendly representation learning module. Specifically, the pseudo-label generation module assigns reliable synthetic labels by solving an optimal transport problem in the E-step, which effectively provides high-quality supervised signals for the input of the cluster-friendly representation learning module. To learn cluster-friendly representation with strong intra-cluster compactness and large inter-cluster separation, the representation learning module combines intra-cluster and inter-cluster contrastive learning in the M-step to feed more discriminative features into the generation module. RoNID can be performed iteratively to ultimately yield a robust model with reliable pseudo-labels and cluster-friendly representations. Experimental results on multiple benchmarks demonstrate our method brings substantial improvements over previous state-of-the-art methods by a large margin of +1~+4 points.

CLApr 10, 2025
Cluster-Driven Expert Pruning for Mixture-of-Experts Large Language Models

Hongcheng Guo, Juntao Yao, Boyang Wang et al.

Mixture-of-Experts (MoE) architectures have emerged as a promising paradigm for scaling large language models (LLMs) with sparse activation of task-specific experts. Despite their computational efficiency during inference, the massive overall parameter footprint of MoE models (e.g., GPT-4) introduces critical challenges for practical deployment. Current pruning approaches often fail to address two inherent characteristics of MoE systems: 1).intra-layer expert homogeneity where experts within the same MoE layer exhibit functional redundancy, and 2). inter-layer similarity patterns where deeper layers tend to contain progressively more homogeneous experts. To tackle these issues, we propose Cluster-driven Expert Pruning (C-Prune), a novel two-stage framework for adaptive task-specific compression of MoE LLMs. C-Prune operates through layer-wise expert clustering, which groups functionally similar experts within each MoE layer using parameter similarity metrics, followed by global cluster pruning, which eliminates redundant clusters across all layers through a unified importance scoring mechanism that accounts for cross-layer homogeneity. We validate C-Prune through extensive experiments on multiple MoE models and benchmarks. The results demonstrate that C-Prune effectively reduces model size while outperforming existing MoE pruning methods.

CVDec 9, 2024
Knowledge Transfer and Domain Adaptation for Fine-Grained Remote Sensing Image Segmentation

Shun Zhang, Xuechao Zou, Kai Li et al.

Fine-grained remote sensing image segmentation is essential for accurately identifying detailed objects in remote sensing images. Recently, vision transformer models (VTMs) pre-trained on large-scale datasets have demonstrated strong zero-shot generalization. However, directly applying them to specific tasks may lead to domain shift. We introduce a novel end-to-end learning paradigm combining knowledge guidance with domain refinement to enhance performance. We present two key components: the Feature Alignment Module (FAM) and the Feature Modulation Module (FMM). FAM aligns features from a CNN-based backbone with those from the pretrained VTM's encoder using channel transformation and spatial interpolation, and transfers knowledge via KL divergence and L2 normalization constraint. FMM further adapts the knowledge to the specific domain to address domain shift. We also introduce a fine-grained grass segmentation dataset and demonstrate, through experiments on two datasets, that our method achieves a significant improvement of 2.57 mIoU on the grass dataset and 3.73 mIoU on the cloud dataset. The results highlight the potential of combining knowledge transfer and domain adaptation to overcome domain-related challenges and data limitations. The project page is available at https://xavierjiezou.github.io/KTDA/.

60.5CRMar 13
Bipartite Randomized Response Mechanism for Local Differential Privacy

Shun Zhang, Hai Zhu, Zhili Chen et al.

With the increasing importance of data privacy, Local Differential Privacy (LDP) has recently become a strong measure of privacy for protecting each user's privacy from data analysts without relying on a trusted third party. In this paper, we consider the problem of high-utility differentially private release. Given a domain of items and a distance-defined utility function, our goal is to design a differentially private mechanism that releases an item with the global expected error as small as possible. The most common LDP mechanism for this task is the Generalized Randomized Response (GRR) mechanism that treats all candidate items equally except for the true item. In this paper, we introduce Bipartite Randomized Response mechanism (BRR), which adaptively divides all candidate items into two parts by utility rankings. In the local search phase, we confirm how many high-utility candidates to be assigned with high release probability, which gives the locally optimal bipartite classification of all candidates. For preserving LDP, the global search phase uniformly selects the smallest number of dynamic high-utility candidates obtained locally. In particular, we give explicit formulas on the uniform number of dynamic high-utility candidates. The global expected error of our BRR can theoretically deliver a decrease with an asymptotically exact ratio, and when the privacy budget is set to $3$ the expected error can be reduced by $66.4\%$. Extensive experiments demonstrate that BRR outperforms the state-of-the-art methods across the standard metrics and datasets.

CRJan 27, 2022
Geo-MOEA: A Multi-Objective Evolutionary Algorithm with Geo-obfuscation for Mobile Crowdsourcing Workers

Shun Zhang, Tao Zhang, Zhili Chen et al.

The rapid development of mobile Internet and sharing economy brings the prosperity of Spatial Crowdsourcing (SC). SC applications assign various tasks according to reported location information of task's requesters and outsourced workers (such as DiDi, MeiTuan and Uber). However, SC-servers are often untrustworthy and the exposure of users' locations raises privacy concerns. In this paper, we design a framework called Geo-MOEA (Multi-Objective Evolutionary Algorithm with Geo-obfuscation) to protect location privacy of workers involved on SC platform in mobile networks environment. We propose an adaptive regionalized obfuscation approach with inference error bounds based on geo-indistinguishability (a strong notion of differential privacy), which is suitable for the context of large-scale location data and task allocations. This enables each worker to report a pseudo-location that is adaptively generated with a personalized inference error threshold. Moreover, as a popular computational intelligence method, MOEA is introduced to optimize the trade-off between SC service availability and privacy protection while ensuring theoretically the most general condition on protection location sets for larger search space. Finally, the experimental results on two public datasets show that our Geo-MOEA approach achieves up to 20% reduction in service quality loss while guaranteeing differential and geo-distortion location privacy.

ITAug 9, 2021
Deep Learning Based Antenna-time Domain Channel Extrapolation for Hybrid mmWave Massive MIMO

Shunbo Zhang, Shun Zhang, Jianpeng Ma et al.

In a time-varying massive multiple-input multipleoutput (MIMO) system, the acquisition of the downlink channel state information at the base station (BS) is a very challenging task due to the prohibitively high overheads associated with downlink training and uplink feedback. In this paper, we consider the hybrid precoding structure at BS and examine the antennatime domain channel extrapolation. We design a latent ordinary differential equation (ODE)-based network under the variational auto-encoder (VAE) framework to learn the mapping function from the partial uplink channels to the full downlink ones at the BS side. Specifically, the gated recurrent unit is adopted for the encoder and the fully-connected neural network is used for the decoder. The end-to-end learning is utilized to optimize the network parameters. Simulation results show that the designed network can efficiently infer the full downlink channels from the partial uplink ones, which can significantly reduce the channel training overhead.

SPMay 14, 2021
Deep Learning Based RIS Channel Extrapolation with Element-grouping

Shunbo Zhang, Shun Zhang, Feifei Gao et al.

Reconfigurable intelligent surface (RIS) is considered as a revolutionary technology for future wireless communication networks. In this letter, we consider the acquisition of the cascaded channels, which is a challenging task due to the massive number of passive RIS elements. To reduce the pilot overhead, we adopt the element-grouping strategy, where each element in one group shares the same reflection coefficient and is assumed to have the same channel condition. We analyze the channel interference caused by the element-grouping strategy and further design two deep learning based networks. The first one aims to refine the partial channels by eliminating the interference, while the second one tries to extrapolate the full channels from the refined partial channels. We cascade the two networks and jointly train them. Simulation results show that the proposed scheme provides significant gain compared to the conventional element-grouping method without interference elimination.

CRFeb 1, 2021
DPIVE: A Regionalized Location Obfuscation Scheme with Personalized Privacy Levels

Shun Zhang, Pengfei Lan, Benfei Duan et al.

The popularity of cyber-physical systems is fueling the rapid growth of location-based services. This poses the risk of location privacy disclosure. Effective privacy preservation is foremost for various mobile applications. Recently, geo-indistinguishability and expected inference error are proposed for limiting location leakages. In this paper, we argue that personalization means regionalization for geo-indistinguishability, and we propose a regionalized location obfuscation mechanism called DPIVE with personalized utility sensitivities. This substantially corrects the differential and distortion privacy problem of PIVE framework proposed by Yu et al. on NDSS 2017. We develop DPIVE with two phases. In Phase I, we determine disjoint sets by partitioning all possible positions such that different locations in the same set share the Protection Location Set (PLS). In Phase II, we construct a probability distribution matrix in which the rows corresponding to the same PLS have their own sensitivity of utility (PLS diameter). Moreover, by designing QK-means algorithm for more search space in 2-D space, we improve DPIVE with refined location partition and present fine-grained personalization, enabling each location to have its own privacy level endowed with a customized privacy budget. Experiments with two public datasets demonstrate that our mechanisms have the superior performance, typically on skewed locations.

CROct 3, 2020
Utility-efficient Differentially Private K-means Clustering based on Cluster Merging

Tianjiao Ni, Minghao Qiao, Zhili Chen et al.

Differential privacy is widely used in data analysis. State-of-the-art $k$-means clustering algorithms with differential privacy typically add an equal amount of noise to centroids for each iterative computation. In this paper, we propose a novel differentially private $k$-means clustering algorithm, DP-KCCM, that significantly improves the utility of clustering by adding adaptive noise and merging clusters. Specifically, to obtain $k$ clusters with differential privacy, the algorithm first generates $n \times k$ initial centroids, adds adaptive noise for each iteration to get $n \times k$ clusters, and finally merges these clusters into $k$ ones. We theoretically prove the differential privacy of the proposed algorithm. Surprisingly, extensive experimental results show that: 1) cluster merging with equal amounts of noise improves the utility somewhat; 2) although adding adaptive noise only does not improve the utility, combining both cluster merging and adaptive noise further improves the utility significantly.

SPSep 3, 2020
Deep Learning Based Antenna Selection for Channel Extrapolation in FDD Massive MIMO

Yindi Yang, Shun Zhang, Feifei Gao et al.

In massive multiple-input multiple-output (MIMO) systems, the large number of antennas would bring a great challenge for the acquisition of the accurate channel state information, especially in the frequency division duplex mode. To overcome the bottleneck of the limited number of radio links in hybrid beamforming, we utilize the neural networks (NNs) to capture the inherent connection between the uplink and downlink channel data sets and extrapolate the downlink channels from a subset of the uplink channel state information. We study the antenna subset selection problem in order to achieve the best channel extrapolation and decrease the data size of NNs. The probabilistic sampling theory is utilized to approximate the discrete antenna selection as a continuous and differentiable function, which makes the back propagation of the deep learning feasible. Then, we design the proper off-line training strategy to optimize both the antenna selection pattern and the extrapolation NNs. Finally, numerical results are presented to verify the effectiveness of our proposed massive MIMO channel extrapolation algorithm.

SPSep 3, 2020
Deep Learning Optimized Sparse Antenna Activation for Reconfigurable Intelligent Surface Assisted Communication

Shunbo Zhang, Shun Zhang, Feifei Gao et al.

To capture the communications gain of the massive radiating elements with low power cost, the conventional reconfigurable intelligent surface (RIS) usually works in passive mode. However, due to the cascaded channel structure and the lack of signal processing ability, it is difficult for RIS to obtain the individual channel state information and optimize the beamforming vector. In this paper, we add signal processing units for a few antennas at RIS to partially acquire the channels. To solve the crucial active antenna selection problem, we construct an active antenna selection network that utilizes the probabilistic sampling theory to select the optimal locations of these active antennas. With this active antenna selection network, we further design two deep learning (DL) based schemes, i.e., the channel extrapolation scheme and the beam searching scheme, to enable the RIS communication system. The former utilizes the selection network and a convolutional neural network to extrapolate the full channels from the partial channels received by the active RIS antennas, while the latter adopts a fully-connected neural network to achieve the direct mapping between the partial channels and the optimal beamforming vector with maximal transmission rate. Simulation results are provided to demonstrate the effectiveness of the designed DL-based schemes.

CRAug 8, 2020
A Differentially Private Framework for Spatial Crowdsourcing with Historical Data Learning

Shun Zhang, Benfei Duan, Zhili Chen et al.

Spatial crowdsourcing (SC) is an increasing popular category of crowdsourcing in the era of mobile Internet and sharing economy. It requires workers to arrive at a particular location for task fulfillment. Effective protection of location privacy is essential for workers' enthusiasm and valid task assignment. However, existing SC models with differential privacy usually perturb real-time location data for both partition and data publication. Such a way may produce large perturbations to counting queries that affect assignment success rate and allocation accuracy. This paper proposes a framework (R-HT) for protecting location privacy of workers taking advantage of both real-time and historical data. We simulate locations by sampling the probability distribution learned from historical data, use them for grid partition, and then publish real-time data under this partitioning with differential privacy. This realizes that most privacy budget is allocated to the worker count of each cell and yields an improved Private Spatial Decomposition approach. Moreover, we introduce some strategies for geocast region construction, including quality scoring function and local maximum geocast radius. A series of experimental results on real-world datasets shows that R-HT attains a stable success rate of task assignment, saves performance overhead and fits for dynamic assignment on crowdsourcing platforms.

CRJan 3, 2020
Differentially Private Combinatorial Cloud Auction

Tianjiao Ni, Zhili Chen, Lin Chen et al.

Cloud service providers typically provide different types of virtual machines (VMs) to cloud users with various requirements. Thanks to its effectiveness and fairness, auction has been widely applied in this heterogeneous resource allocation. Recently, several strategy-proof combinatorial cloud auction mechanisms have been proposed. However, they fail to protect the bid privacy of users from being inferred from the auction results. In this paper, we design a differentially private combinatorial cloud auction mechanism (DPCA) to address this privacy issue. Technically, we employ the exponential mechanism to compute a clearing unit price vector with a probability proportional to the corresponding revenue. We further improve the mechanism to reduce the running time while maintaining high revenues, by computing a single clearing unit price, or a subgroup of clearing unit prices at a time, resulting in the improved mechanisms DPCA-S and its generalized version DPCA-M, respectively. We theoretically prove that our mechanisms can guarantee differential privacy, approximate truthfulness and high revenue. Extensive experimental results demonstrate that DPCA can generate near-optimal revenues at the price of relatively high time complexity, while the improved mechanisms achieve a tunable trade-off between auction revenue and running time.

CRAug 10, 2019
Differentially Private Aggregated Mobility Data Publication Using Moving Characteristics

Zhili Chen, Xiaoli Kan, Shun Zhang et al.

With the rapid development of GPS enabled devices (smartphones) and location-based applications, location privacy is increasingly concerned. Intuitively, it is widely believed that location privacy can be preserved by publishing aggregated mobility data, such as the number of users in an area at some time. However, a recent attack shows that these aggregated mobility data can be exploited to recover individual trajectories. In this paper, we first propose two differentially private basic schemes for aggregated mobility data publication, namely direct perturbation and threshold perturbation, which preserve location privacy of users and especially resist the trajectory recovery attack. Then, we explore the moving characteristics of mobile users, and design an improved scheme named static hybrid perturbation by combining the two basic schemes according to the moving characteristics. Since static hybrid perturbation works only for static data, which are entirely available before publishing, we further adapt the static hybrid perturbation by combining it with linear regression, and yield another improved scheme named dynamic hybrid perturbation. The dynamic hybrid perturbation works also for dynamic data, which are generated on the fly during publication. Privacy analysis shows that the proposed schemes achieve differential privacy. Extensive experiments on both simulated and real datasets demonstrate that all proposed schemes resist the trajectory recovery attack well, and the improved schemes significantly outperform the basic schemes.

MLApr 24, 2019
$S^{2}$-LBI: Stochastic Split Linearized Bregman Iterations for Parsimonious Deep Learning

Yanwei Fu, Donghao Li, Xinwei Sun et al.

This paper proposes a novel Stochastic Split Linearized Bregman Iteration ($S^{2}$-LBI) algorithm to efficiently train the deep network. The $S^{2}$-LBI introduces an iterative regularization path with structural sparsity. Our $S^{2}$-LBI combines the computational efficiency of the LBI, and model selection consistency in learning the structural sparsity. The computed solution path intrinsically enables us to enlarge or simplify a network, which theoretically, is benefited from the dynamics property of our $S^{2}$-LBI algorithm. The experimental results validate our $S^{2}$-LBI on MNIST and CIFAR-10 dataset. For example, in MNIST, we can either boost a network with only 1.5K parameters (1 convolutional layer of 5 filters, and 1 FC layer), achieves 98.40\% recognition accuracy; or we simplify $82.5\%$ of parameters in LeNet-5 network, and still achieves the 98.47\% recognition accuracy. In addition, we also have the learning results on ImageNet, which will be added in the next version of our report.