CVNov 12, 2020
VCE: Variational Convertor-Encoder for One-Shot GeneralizationChengshuai Li, Shuai Han, Jianping Xing
Variational Convertor-Encoder (VCE) converts an image to various styles; we present this novel architecture for the problem of one-shot generalization and its transfer to new tasks not seen before without additional training. We also improve the performance of variational auto-encoder (VAE) to filter those blurred points using a novel algorithm proposed by us, namely large margin VAE (LMVAE). Two samples with the same property are input to the encoder, and then a convertor is required to processes one of them from the noisy outputs of the encoder; finally, the noise represents a variety of transformation rules and is used to convert new images. The algorithm that combines and improves the condition variational auto-encoder (CVAE) and introspective VAE, we propose this new framework aim to transform graphics instead of generating them; it is used for the one-shot generative process. No sequential inference algorithmic is needed in training. Compared to recent Omniglot datasets, the results show that our model produces more realistic and diverse images.
CVJan 26, 2020
SDOD:Real-time Segmenting and Detecting 3D Object by DepthShengjie Li, Caiyi Xu, Jianping Xing et al.
Most existing instance segmentation methods only focus on improving performance and are not suitable for real-time scenes such as autonomous driving. This paper proposes a real-time framework that segmenting and detecting 3D objects by depth. The framework is composed of two parallel branches: one for instance segmentation and another for object detection. We discretize the objects' depth into depth categories and transform the instance segmentation task into a pixel-level classification task. The Mask branch predicts pixel-level depth categories, and the 3D branch indicates instance-level depth categories. We produce an instance mask by assigning pixels which have the same depth categories to each instance. In addition, to solve the imbalance between mask labels and 3D labels in the KITTI dataset, we introduce a coarse mask generated by the auto-annotation model to increase samples. Experiments on the challenging KITTI dataset show that our approach outperforms LklNet about 1.8 times on the speed of segmentation and 3D detection.