He Zhang

h-index30

4papers

80citations

Novelty36%

AI Score31

Ranked #132,596 of 194,257 authors (top 68%)#23,970 in CL (top 78%)

4 Papers

13.5CVSep 13, 2024

GroundingBooth: Grounding Text-to-Image Customization

Zhexiao Xiong, Wei Xiong, Jing Shi et al.

Recent approaches in text-to-image customization have primarily focused on preserving the identity of the input subject, but often fail to control the spatial location and size of objects. We introduce GroundingBooth, which achieves zero-shot, instance-level spatial grounding on both foreground subjects and background objects in the text-to-image customization task. Our proposed grounding module and subject-grounded cross-attention layer enable the creation of personalized images with accurate layout alignment, identity preservation, and strong text-image coherence. In addition, our model seamlessly supports personalization with multiple subjects. Our model shows strong results in both layout-guided image synthesis and text-to-image customization tasks. The project page is available at https://groundingbooth.github.io.

13.9CLJun 28, 2025

VOCABTRIM: Vocabulary Pruning for Efficient Speculative Decoding in LLMs

Raghavv Goel, Sudhanshu Agrawal, Mukul Gagrani et al.

In this paper, we introduce a simple training-free technique to improve the performance of drafter-based speculative decoding (SpD) methods that incorporates language modeling head (LM head) during drafting process. A drafter-based speculative decoding leverages one or more smaller language models, a.k.a. drafters or draft models, to sample a draft sequence or tree consisting of multiple tokens, followed by verification by a base LLM, a target model, accepting a subset as its valid generation. As it is usually considered that the speculative decoding requires one-to-one mapping between vocabularies of the target model and the draft model, it has been natural to share the vocabulary between them, or even share the LM head as in EAGLE or Medusa. We first identify that this draft token sampling scheme inherently contains an unnecessary inference overhead in drafting, especially for some target LLMs with very large vocabularies. Then, we propose a simple technique, VocabTrim, to mitigate the drafting overhead to improve the generation speed in memory-bound environment. VocabTrim reconstructs the drafter LM head to contain only a limited set of tokens, selected by the most frequently sampled from the vocabulary of the target model. While limiting the vocabulary in drafting slightly degrades the acceptance rate, it significantly reduces the drafting latency in memory-bound process which is often the case on edge devices, resulting in higher memory-bound speed up (MBSU). We show that our method can boost the memory-bound speed-up for Llama-3 models on Spec-Bench, specifically by 16% for Llama-3.2-3B-Instruct.

10.4SEMar 17, 2021

CrowdSim: A Hybrid Simulation Model for Failure Prediction in Crowdsourced Software Development

Razieh Saremi, Ye Yang, Gregg Vesonder et al.

A typical crowdsourcing software development(CSD) marketplace consists of a list of software tasks as service demands and a pool of freelancer developers as service suppliers. Highly dynamic and competitive CSD market places may result in task failure due to unforeseen risks, such as increased competition over shared worker supply, or uncertainty associated with workers' experience and skills, and so on. To improve CSD effectiveness, it is essential to better understand and plan with respect to dynamic worker characteristics and risks associated with CSD processes. In this paper, we present a hybrid simulation model, CrowdSim, to forecast crowdsourcing task failure risk in competitive CSD platforms. CrowdSim is composed of three layered components: the macro-level reflects the overall crowdsourcing platform based on system dynamics,the meso-level represents the task life cycle based on discrete event simulation, and the micro-level models the crowd workers' decision-making processes based on agent-based simulation. CrowdSim is evaluated through three CSD decision scenarios to demonstrate its effectiveness, using a real-world historical dataset and the results demonstrate CrowdSim's potential in empowering crowdsourcing managers to explore crowdsourcing outcomes with respect to different task scheduling options.

13.0CRJan 23, 2019

Deep Adversarial Learning in Intrusion Detection: A Data Augmentation Enhanced Framework

He Zhang, Xingrui Yu, Peng Ren et al.

Intrusion detection systems (IDSs) play an important role in identifying malicious attacks and threats in networking systems. As fundamental tools of IDSs, learning based classification methods have been widely employed. When it comes to detecting network intrusions in small sample sizes (e.g., emerging intrusions), the limited number and imbalanced proportion of training samples usually cause significant challenges in training supervised and semi-supervised classifiers. In this paper, we propose a general network intrusion detection framework to address the challenges of both \emph{data scarcity} and \emph{data imbalance}. The novelty of the proposed framework focuses on incorporating deep adversarial learning with statistical learning and exploiting learning based data augmentation. Given a small set of network intrusion samples, it first derives a Poisson-Gamma joint probabilistic generative model to generate synthesised intrusion data using Monte Carlo methods. Those synthesised data are then augmented by deep generative neural networks through adversarial learning. Finally, it adopts the augmented intrusion data to train supervised models for detecting network intrusions. Comprehensive experimental validations on KDD Cup 99 dataset show that the proposed framework outperforms the existing learning based IDSs in terms of improved accuracy, precision, recall, and F1-score.