Bo Zhang

CV
h-index35
16papers
570citations
Novelty53%
AI Score36

16 Papers

8.7CVNov 12, 2024Code
MureObjectStitch: Multi-reference Image Composition

Jiaxuan Chen, Bo Zhang, Qingdong He et al.

Generative image composition aims to regenerate the given foreground object in the background image to produce a realistic composite image. The existing methods are struggling to preserve the foreground details and adjust the foreground pose/viewpoint at the same time. In this work, we propose an effective finetuning strategy for generative image composition model, in which we finetune a pretrained model using one or more images containing the same foreground object. Moreover, we propose a multi-reference strategy, which allows the model to take in multiple reference images of the foreground object. The experiments on MureCOM dataset verify the effectiveness of our method. The code and model have been released at https://github.com/bcmi/MureObjectStitch-Image-Composition.

6.5CVDec 4, 2024
TREND: Unsupervised 3D Representation Learning via Temporal Forecasting for LiDAR Perception

Runjian Chen, Hyoungseob Park, Bo Zhang et al.

Labeling LiDAR point clouds is notoriously time-and-energy-consuming, which spurs recent unsupervised 3D representation learning methods to alleviate the labeling burden in LiDAR perception via pretrained weights. Almost all existing work focus on a single frame of LiDAR point cloud and neglect the temporal LiDAR sequence, which naturally accounts for object motion (and their semantics). Instead, we propose TREND, namely Temporal REndering with Neural fielD, to learn 3D representation via forecasting the future observation in an unsupervised manner. Unlike existing work that follows conventional contrastive learning or masked auto encoding paradigms, TREND integrates forecasting for 3D pre-training through a Recurrent Embedding scheme to generate 3D embedding across time and a Temporal Neural Field to represent the 3D scene, through which we compute the loss using differentiable rendering. To our best knowledge, TREND is the first work on temporal forecasting for unsupervised 3D representation learning. We evaluate TREND on downstream 3D object detection tasks on popular datasets, including NuScenes, Once and Waymo. Experiment results show that TREND brings up to 90% more improvement as compared to previous SOTA unsupervised 3D pre-training methods and generally improve different downstream models across datasets, demonstrating that indeed temporal forecasting brings improvement for LiDAR perception. Codes and models will be released.

2.3CRMay 28, 2023Code
Amplification trojan network: Attack deep neural networks by amplifying their inherent weakness

Zhanhao Hu, Jun Zhu, Bo Zhang et al.

Recent works found that deep neural networks (DNNs) can be fooled by adversarial examples, which are crafted by adding adversarial noise on clean inputs. The accuracy of DNNs on adversarial examples will decrease as the magnitude of the adversarial noise increase. In this study, we show that DNNs can be also fooled when the noise is very small under certain circumstances. This new type of attack is called Amplification Trojan Attack (ATAttack). Specifically, we use a trojan network to transform the inputs before sending them to the target DNN. This trojan network serves as an amplifier to amplify the inherent weakness of the target DNN. The target DNN, which is infected by the trojan network, performs normally on clean data while being more vulnerable to adversarial examples. Since it only transforms the inputs, the trojan network can hide in DNN-based pipelines, e.g. by infecting the pre-processing procedure of the inputs before sending them to the DNNs. This new type of threat should be considered in developing safe DNNs.

11.9LGJun 30, 2021Code
Understanding Adversarial Attacks on Observations in Deep Reinforcement Learning

You Qiaoben, Chengyang Ying, Xinning Zhou et al.

Deep reinforcement learning models are vulnerable to adversarial attacks that can decrease a victim's cumulative expected reward by manipulating the victim's observations. Despite the efficiency of previous optimization-based methods for generating adversarial noise in supervised learning, such methods might not be able to achieve the lowest cumulative reward since they do not explore the environmental dynamics in general. In this paper, we provide a framework to better understand the existing methods by reformulating the problem of adversarial attacks on reinforcement learning in the function space. Our reformulation generates an optimal adversary in the function space of the targeted attacks, repelling them via a generic two-stage framework. In the first stage, we train a deceptive policy by hacking the environment, and discover a set of trajectories routing to the lowest reward or the worst-case performance. Next, the adversary misleads the victim to imitate the deceptive policy by perturbing the observations. Compared to existing approaches, we theoretically show that our adversary is stronger under an appropriate noise level. Extensive experiments demonstrate our method's superiority in terms of efficiency and effectiveness, achieving the state-of-the-art performance in both Atari and MuJoCo environments.

24.7CVSep 27, 2019
Pruning from Scratch

Yulong Wang, Xiaolu Zhang, Lingxi Xie et al.

Network pruning is an important research field aiming at reducing computational costs of neural networks. Conventional approaches follow a fixed paradigm which first trains a large and redundant network, and then determines which units (e.g., channels) are less important and thus can be removed. In this work, we find that pre-training an over-parameterized model is not necessary for obtaining the target pruned structure. In fact, a fully-trained over-parameterized model will reduce the search space for the pruned structure. We empirically show that more diverse pruned structures can be directly pruned from randomly initialized weights, including potential models with better performance. Therefore, we propose a novel network pruning pipeline which allows pruning from scratch. In the experiments for compressing classification models on CIFAR10 and ImageNet datasets, our approach not only greatly reduces the pre-training burden of traditional pruning methods, but also achieves similar or even higher accuracy under the same computation budgets. Our results facilitate the community to rethink the effectiveness of existing techniques used for network pruning.

7.7LGJun 10, 2019Code
Multi-objects Generation with Amortized Structural Regularization

Kun Xu, Chongxuan Li, Jun Zhu et al.

Deep generative models (DGMs) have shown promise in image generation. However, most of the existing work learn the model by simply optimizing a divergence between the marginal distributions of the model and the data, and often fail to capture the rich structures and relations in multi-object images. Human knowledge is a critical element to the success of DGMs to infer these structures. In this paper, we propose the amortized structural regularization (ASR) framework, which adopts the posterior regularization (PR) to embed human knowledge into DGMs via a set of structural constraints. We derive a lower bound of the regularized log-likelihood, which can be jointly optimized with respect to the generative model and recognition model efficiently. Empirical results show that ASR significantly outperforms the DGM baselines in terms of inference accuracy and sample quality.

20.5MLFeb 26, 2019Code
Function Space Particle Optimization for Bayesian Neural Networks

Ziyu Wang, Tongzheng Ren, Jun Zhu et al.

While Bayesian neural networks (BNNs) have drawn increasing attention, their posterior inference remains challenging, due to the high-dimensional and over-parameterized nature. To address this issue, several highly flexible and scalable variational inference procedures based on the idea of particle optimization have been proposed. These methods directly optimize a set of particles to approximate the target posterior. However, their application to BNNs often yields sub-optimal performance, as such methods have a particular failure mode on over-parameterized models. In this paper, we propose to solve this issue by performing particle optimization directly in the space of regression functions. We demonstrate through extensive experiments that our method successfully overcomes this issue, and outperforms strong baselines in a variety of tasks including prediction, defense against adversarial examples, and reinforcement learning.

1.7IRFeb 2, 2019
Joint Cluster Unary Loss for Efficient Cross-Modal Hashing

Shifeng Zhang, Jianmin Li, Bo Zhang

With the rapid growth of various types of multimodal data, cross-modal deep hashing has received broad attention for solving cross-modal retrieval problems efficiently. Most cross-modal hashing methods follow the traditional supervised hashing framework in which the $O(n^2)$ data pairs and $O(n^3)$ data triplets are generated for training, but the training procedure is less efficient because the complexity is high for large-scale dataset. To address these issues, we propose a novel and efficient cross-modal hashing algorithm in which the unary loss is introduced. First of all, We introduce the Cross-Modal Unary Loss (CMUL) with $O(n)$ complexity to bridge the traditional triplet loss and classification-based unary loss. A more accurate bound of the triplet loss for structured multilabel data is also proposed in CMUL. Second, we propose the novel Joint Cluster Cross-Modal Hashing (JCCH) algorithm for efficient hash learning, in which the CMUL is involved. The resultant hashcodes form several clusters in which the hashcodes in the same cluster share similar semantic information, and the heterogeneity gap on different modalities is diminished by sharing the clusters. The proposed algorithm is able to be applied to various types of data, and experiments on large-scale datasets show that the proposed method is superior over or comparable with state-of-the-art cross-modal hashing methods, and training with the proposed method is more efficient than others.

6.5CVFeb 2, 2019
Pairwise Teacher-Student Network for Semi-Supervised Hashing

Shifeng Zhang, Jianmin Li, Bo Zhang

Hashing method maps similar high-dimensional data to binary hashcodes with smaller hamming distance, and it has received broad attention due to its low storage cost and fast retrieval speed. Pairwise similarity is easily obtained and widely used for retrieval, and most supervised hashing algorithms are carefully designed for the pairwise supervisions. As labeling all data pairs is difficult, semi-supervised hashing is proposed which aims at learning efficient codes with limited labeled pairs and abundant unlabeled ones. Existing methods build graphs to capture the structure of dataset, but they are not working well for complex data as the graph is built based on the data representations and determining the representations of complex data is difficult. In this paper, we propose a novel teacher-student semi-supervised hashing framework in which the student is trained with the pairwise information produced by the teacher network. The network follows the smoothness assumption, which achieves consistent distances for similar data pairs so that the retrieval results are similar for neighborhood queries. Experiments on large-scale datasets show that the proposed method reaches impressive gain over the supervised baselines and is superior to state-of-the-art semi-supervised hashing methods.

7.1LGJan 24, 2019
To Relieve Your Headache of Training an MRF, Take AdVIL

Chongxuan Li, Chao Du, Kun Xu et al.

We propose a black-box algorithm called {\it Adversarial Variational Inference and Learning} (AdVIL) to perform inference and learning on a general Markov random field (MRF). AdVIL employs two variational distributions to approximately infer the latent variables and estimate the partition function of an MRF, respectively. The two variational distributions provide an estimate of the negative log-likelihood of the MRF as a minimax optimization problem, which is solved by stochastic gradient descent. AdVIL is proven convergent under certain conditions. On one hand, compared with contrastive divergence, AdVIL requires a minimal assumption about the model structure and can deal with a broader family of MRFs. On the other hand, compared with existing black-box methods, AdVIL provides a tighter estimate of the log partition function and achieves much better empirical results.

6.7MLOct 29, 2018Code
Semi-crowdsourced Clustering with Deep Generative Models

Yucen Luo, Tian Tian, Jiaxin Shi et al.

We consider the semi-supervised clustering problem where crowdsourcing provides noisy information about the pairwise comparisons on a small subset of data, i.e., whether a sample pair is in the same cluster. We propose a new approach that includes a deep generative model (DGM) to characterize low-level features of the data, and a statistical relational model for noisy pairwise annotations on its subset. The two parts share the latent variables. To make the model automatically trade-off between its complexity and fitting data, we also develop its fully Bayesian variant. The challenge of inference is addressed by fast (natural-gradient) stochastic variational inference algorithms, where we effectively combine variational message passing for the relational part and amortized learning of the DGM under a unified framework. Empirical results on synthetic and real-world datasets show that our model outperforms previous crowdsourced clustering methods.

15.1CVJun 7, 2018
Interlinked Convolutional Neural Networks for Face Parsing

Yisu Zhou, Xiaolin Hu, Bo Zhang

Face parsing is a basic task in face image analysis. It amounts to labeling each pixel with appropriate facial parts such as eyes and nose. In the paper, we present a interlinked convolutional neural network (iCNN) for solving this problem in an end-to-end fashion. It consists of multiple convolutional neural networks (CNNs) taking input in different scales. A special interlinking layer is designed to allow the CNNs to exchange information, enabling them to integrate local and contextual information efficiently. The hallmark of iCNN is the extensive use of downsampling and upsampling in the interlinking layers, while traditional CNNs usually uses downsampling only. A two-stage pipeline is proposed for face parsing and both stages use iCNN. The first stage localizes facial parts in the size-reduced image and the second stage labels the pixels in the identified facial parts in the original image. On a benchmark dataset we have obtained better results than the state-of-the-art methods.

1.1CVSep 28, 2016
Scalable Discrete Supervised Hash Learning with Asymmetric Matrix Factorization

Shifeng Zhang, Jianmin Li, Jinma Guo et al.

Hashing method maps similar data to binary hashcodes with smaller hamming distance, and it has received a broad attention due to its low storage cost and fast retrieval speed. However, the existing limitations make the present algorithms difficult to deal with large-scale datasets: (1) discrete constraints are involved in the learning of the hash function; (2) pairwise or triplet similarity is adopted to generate efficient hashcodes, resulting both time and space complexity are greater than O(n^2). To address these issues, we propose a novel discrete supervised hash learning framework which can be scalable to large-scale datasets. First, the discrete learning procedure is decomposed into a binary classifier learning scheme and binary codes learning scheme, which makes the learning procedure more efficient. Second, we adopt the Asymmetric Low-rank Matrix Factorization and propose the Fast Clustering-based Batch Coordinate Descent method, such that the time and space complexity is reduced to O(n). The proposed framework also provides a flexible paradigm to incorporate with arbitrary hash function, including deep neural networks and kernel methods. Experiments on large-scale datasets demonstrate that the proposed method is superior or comparable with state-of-the-art hashing algorithms.

2.1LGDec 7, 2015
Discriminative Nonparametric Latent Feature Relational Models with Data Augmentation

Bei Chen, Ning Chen, Jun Zhu et al.

We present a discriminative nonparametric latent feature relational model (LFRM) for link prediction to automatically infer the dimensionality of latent features. Under the generic RegBayes (regularized Bayesian inference) framework, we handily incorporate the prediction loss with probabilistic inference of a Bayesian model; set distinct regularization parameters for different types of links to handle the imbalance issue in real networks; and unify the analysis of both the smooth logistic log-loss and the piecewise linear hinge loss. For the nonconjugate posterior inference, we present a simple Gibbs sampler via data augmentation, without making restricting assumptions as done in variational methods. We further develop an approximate sampler using stochastic gradient Langevin dynamics to handle large networks with hundreds of thousands of entities and millions of links, orders of magnitude larger than what existing LFRM models can process. Extensive studies on various real networks show promising performance.

14.4LGNov 24, 2014
Big Learning with Bayesian Methods

Jun Zhu, Jianfei Chen, Wenbo Hu et al.

Explosive growth in data and availability of cheap computing resources have sparked increasing interest in Big learning, an emerging subfield that studies scalable machine learning algorithms, systems, and applications with Big Data. Bayesian methods represent one important class of statistic methods for machine learning, with substantial recent developments on adaptive, flexible and scalable Bayesian learning. This article provides a survey of the recent advances in Big learning with Bayesian methods, termed Big Bayesian Learning, including nonparametric Bayesian methods for adaptively inferring model complexity, regularized Bayesian inference for improving the flexibility via posterior regularization, and scalable algorithms and systems based on stochastic subsampling and distributed computing for dealing with large-scale applications.

15.9LGOct 9, 2013
Improved Bayesian Logistic Supervised Topic Models with Data Augmentation

Jun Zhu, Xun Zheng, Bo Zhang

Supervised topic models with a logistic likelihood have two issues that potentially limit their practical use: 1) response variables are usually over-weighted by document word counts; and 2) existing variational inference methods make strict mean-field assumptions. We address these issues by: 1) introducing a regularization constant to better balance the two parts based on an optimization formulation of Bayesian inference; and 2) developing a simple Gibbs sampling algorithm by introducing auxiliary Polya-Gamma variables and collapsing out Dirichlet variables. Our augment-and-collapse sampling algorithm has analytical forms of each conditional distribution without making any restricting assumptions and can be easily parallelized. Empirical results demonstrate significant improvements on prediction performance and time efficiency.