He Zhang

h-index13

10papers

882citations

Novelty38%

AI Score32

Ranked #122,212 of 194,257 authors (top 63%)#40,617 in CV (top 69%)

10 Papers

24.8CLOct 30, 2022Code

How Far are We from Robust Long Abstractive Summarization?

Huan Yee Koh, Jiaxin Ju, He Zhang et al.

Abstractive summarization has made tremendous progress in recent years. In this work, we perform fine-grained human annotations to evaluate long document abstractive summarization systems (i.e., models and metrics) with the aim of implementing them to generate reliable summaries. For long document abstractive models, we show that the constant strive for state-of-the-art ROUGE results can lead us to generate more relevant summaries but not factual ones. For long document evaluation metrics, human evaluation results show that ROUGE remains the best at evaluating the relevancy of a summary. It also reveals important limitations of factuality metrics in detecting different types of factual errors and the reasons behind the effectiveness of BARTScore. We then suggest promising directions in the endeavor of developing factual consistency metrics. Finally, we release our annotated long document dataset with the hope that it can contribute to the development of metrics across a broader range of summarization settings.

13.0LGJun 17, 2022

A Flexible Diffusion Model

Weitao Du, Tao Yang, He Zhang et al.

Diffusion (score-based) generative models have been widely used for modeling various types of complex data, including images, audios, and point clouds. Recently, the deep connection between forward-backward stochastic differential equations (SDEs) and diffusion-based models has been revealed, and several new variants of SDEs are proposed (e.g., sub-VP, critically-damped Langevin) along this line. Despite the empirical success of the hand-crafted fixed forward SDEs, a great quantity of proper forward SDEs remain unexplored. In this work, we propose a general framework for parameterizing the diffusion model, especially the spatial part of the forward SDE. An abstract formalism is introduced with theoretical guarantees, and its connection with previous diffusion models is leveraged. We demonstrate the theoretical advantage of our method from an optimization perspective. Numerical experiments on synthetic datasets, MINIST and CIFAR10 are also presented to validate the effectiveness of our framework.

20.2CVMay 6, 2017Code

Sparse Representation-based Open Set Recognition

He Zhang, Vishal M. Patel

We propose a generalized Sparse Representation- based Classification (SRC) algorithm for open set recognition where not all classes presented during testing are known during training. The SRC algorithm uses class reconstruction errors for classification. As most of the discriminative information for open set recognition is hidden in the tail part of the matched and sum of non-matched reconstruction error distributions, we model the tail of those two error distributions using the statistical Extreme Value Theory (EVT). Then we simplify the open set recognition problem into a set of hypothesis testing problems. The confidence scores corresponding to the tail distributions of a novel test sample are then fused to determine its identity. The effectiveness of the proposed method is demonstrated using four publicly available image and object classification datasets and it is shown that this method can perform significantly better than many competitive open set recognition algorithms. Code is public available: https://github.com/hezhangsprinter/SROSR

15.3CVDec 4, 2024

DIVE: Taming DINO for Subject-Driven Video Editing

Yi Huang, Wei Xiong, He Zhang et al.

Building on the success of diffusion models in image generation and editing, video editing has recently gained substantial attention. However, maintaining temporal consistency and motion alignment still remains challenging. To address these issues, this paper proposes DINO-guided Video Editing (DIVE), a framework designed to facilitate subject-driven editing in source videos conditioned on either target text prompts or reference images with specific identities. The core of DIVE lies in leveraging the powerful semantic features extracted from a pretrained DINOv2 model as implicit correspondences to guide the editing process. Specifically, to ensure temporal motion consistency, DIVE employs DINO features to align with the motion trajectory of the source video. For precise subject editing, DIVE incorporates the DINO features of reference images into a pretrained text-to-image model to learn Low-Rank Adaptations (LoRAs), effectively registering the target subject's identity. Extensive experiments on diverse real-world videos demonstrate that our framework can achieve high-quality editing results with robust motion consistency, highlighting the potential of DINO to contribute to video editing. Project page: https://dino-video-editing.github.io

23.0CVMay 29, 2023

Photoswap: Personalized Subject Swapping in Images

Jing Gu, Yilin Wang, Nanxuan Zhao et al.

In an era where images and visual content dominate our digital landscape, the ability to manipulate and personalize these images has become a necessity. Envision seamlessly substituting a tabby cat lounging on a sunlit window sill in a photograph with your own playful puppy, all while preserving the original charm and composition of the image. We present Photoswap, a novel approach that enables this immersive image editing experience through personalized subject swapping in existing images. Photoswap first learns the visual concept of the subject from reference images and then swaps it into the target image using pre-trained diffusion models in a training-free manner. We establish that a well-conceptualized visual subject can be seamlessly transferred to any image with appropriate self-attention and cross-attention manipulation, maintaining the pose of the swapped subject and the overall coherence of the image. Comprehensive experiments underscore the efficacy and controllability of Photoswap in personalized subject swapping. Furthermore, Photoswap significantly outperforms baseline methods in human ratings across subject swapping, background preservation, and overall quality, revealing its vast application potential, from entertainment to professional editing.

12.4LGFeb 25, 2022

Projective Ranking-based GNN Evasion Attacks

He Zhang, Xingliang Yuan, Chuan Zhou et al.

Graph neural networks (GNNs) offer promising learning methods for graph-related tasks. However, GNNs are at risk of adversarial attacks. Two primary limitations of the current evasion attack methods are highlighted: (1) The current GradArgmax ignores the "long-term" benefit of the perturbation. It is faced with zero-gradient and invalid benefit estimates in certain situations. (2) In the reinforcement learning-based attack methods, the learned attack strategies might not be transferable when the attack budget changes. To this end, we first formulate the perturbation space and propose an evaluation framework and the projective ranking method. We aim to learn a powerful attack strategy then adapt it as little as possible to generate adversarial samples under dynamic budget settings. In our method, based on mutual information, we rank and assess the attack benefits of each perturbation for an effective attack strategy. By projecting the strategy, our method dramatically minimizes the cost of learning a new attack strategy when the attack budget changes. In the comparative assessment with GradArgmax and RL-S2V, the results show our method owns high attack performance and effective transferability. The visualization of our method also reveals various attack patterns in the generation of adversarial samples.

19.7CVDec 20, 2021Code

Lite Vision Transformer with Enhanced Self-Attention

Chenglin Yang, Yilin Wang, Jianming Zhang et al.

Despite the impressive representation capacity of vision transformer models, current light-weight vision transformer models still suffer from inconsistent and incorrect dense predictions at local regions. We suspect that the power of their self-attention mechanism is limited in shallower and thinner networks. We propose Lite Vision Transformer (LVT), a novel light-weight transformer network with two enhanced self-attention mechanisms to improve the model performances for mobile deployment. For the low-level features, we introduce Convolutional Self-Attention (CSA). Unlike previous approaches of merging convolution and self-attention, CSA introduces local self-attention into the convolution within a kernel of size 3x3 to enrich low-level features in the first stage of LVT. For the high-level features, we propose Recursive Atrous Self-Attention (RASA), which utilizes the multi-scale context when calculating the similarity map and a recursive mechanism to increase the representation capability with marginal extra parameter cost. The superiority of LVT is demonstrated on ImageNet recognition, ADE20K semantic segmentation, and COCO panoptic segmentation. The code is made publicly available.

6.4CLJul 27, 2021

Federated Learning Meets Natural Language Processing: A Survey

Ming Liu, Stella Ho, Mengqi Wang et al.

Federated Learning aims to learn machine learning models from multiple decentralized edge devices (e.g. mobiles) or servers without sacrificing local data privacy. Recent Natural Language Processing techniques rely on deep learning and large pre-trained language models. However, both big deep neural and language models are trained with huge amounts of data which often lies on the server side. Since text data is widely originated from end users, in this work, we look into recent NLP models and techniques which use federated learning as the learning framework. Our survey discusses major challenges in federated natural language processing, including the algorithm challenges, system challenges as well as the privacy issues. We also provide a critical review of the existing Federated NLP evaluation methods and tools. Finally, we highlight the current research gaps and future directions.

3.6SEMar 24, 2021Code

Exploiting the Unique Expression for Improved Sentiment Analysis in Software Engineering Text

Kexin Sun, Hui Gao, Hongyu Kuang et al.

Sentiment analysis on software engineering (SE) texts has been widely used in the SE research, such as evaluating app reviews or analyzing developers sentiments in commit messages. To better support the use of automated sentiment analysis for SE tasks, researchers built an SE-domain-specified sentiment dictionary to further improve the accuracy of the results. Unfortunately, recent work reported that current mainstream tools for sentiment analysis still cannot provide reliable results when analyzing the sentiments in SE texts. We suggest that the reason for this situation is because the way of expressing sentiments in SE texts is largely different from the way in social network or movie comments. In this paper, we propose to improve sentiment analysis in SE texts by using sentence structures, a different perspective from building a domain dictionary. Specifically, we use sentence structures to first identify whether the author is expressing her sentiment in a given clause of an SE text, and to further adjust the calculation of sentiments which are confirmed in the clause. An empirical evaluation based on four different datasets shows that our approach can outperform two dictionary-based baseline approaches, and is more generalizable compared to a learning-based baseline approach.

2.9SEApr 3, 2017

Review on Requirements Modeling and Analysis for Self-Adaptive Systems: A Ten-Year Perspective

Zhuoqun Yang, Zhi Li, Zhi Jin et al.

Context: Over the last decade, software researchers and engineers have developed a vast body of methodologies and technologies in requirements engineering for self-adaptive systems. Although existing studies have explored various aspects of this field, no systematic study has been performed on summarizing modeling methods and corresponding requirements activities. Objective: This study summarizes the state-of-the-art research trends, details the modeling methods and corresponding requirements activities, identifies relevant quality attributes and application domains and assesses the quality of each study. Method: We perform a systematic literature review underpinned by a rigorously established and reviewed protocol. To ensure the quality of the study, we choose 21 highly regarded publication venues and 8 popular digital libraries. In addition, we apply text mining to derive search strings and use Kappa coefficient to mitigate disagreements of researchers. Results: We selected 109 papers during the period of 2003-2013 and presented the research distributions over various kinds of factors. We extracted 29 modeling methods which are classified into 8 categories and identified 14 requirements activities which are classified into 4 requirements timelines. We captured 8 concerned software quality attributes based on the ISO 9126 standard and 12 application domains. Conclusion: The frequency of application of modeling methods varies greatly. Enterprise models were more widely used while behavior models were more rigorously evaluated. Requirements-driven runtime adaptation was the most frequently studied requirements activity. Activities at runtime were conveyed with more details. Finally, we draw other conclusions by discussing how well modeling dimensions were considered in these modeling methods and how well assurance dimensions were conveyed in requirements activities.