Yongbin Sun

CV
7papers
7,855citations
Novelty39%
AI Score26

7 Papers

CVMar 11, 2020
Learning Diverse Fashion Collocation by Neural Graph Filtering

Xin Liu, Yongbin Sun, Ziwei Liu et al.

Fashion recommendation systems are highly desired by customers to find visually-collocated fashion items, such as clothes, shoes, bags, etc. While existing methods demonstrate promising results, they remain lacking in flexibility and diversity, e.g. assuming a fixed number of items or favoring safe but boring recommendations. In this paper, we propose a novel fashion collocation framework, Neural Graph Filtering, that models a flexible set of fashion items via a graph neural network. Specifically, we consider the visual embeddings of each garment as a node in the graph, and describe the inter-garment relationship as the edge between nodes. By applying symmetric operations on the edge vectors, this framework allows varying numbers of inputs/outputs and is invariant to their ordering. We further include a style classifier augmented with focal loss to enable the collocation of significantly diverse styles, which are inherently imbalanced in the training set. To facilitate a comprehensive study on diverse fashion collocation, we reorganize Amazon Fashion dataset with carefully designed evaluation protocols. We evaluate the proposed approach on three popular benchmarks, the Polyvore dataset, the Polyvore-D dataset, and our reorganized Amazon Fashion dataset. Extensive experimental results show that our approach significantly outperforms the state-of-the-art methods with over 10% improvements on the standard AUC metric on the established tasks. More importantly, 82.5% of the users prefer our diverse-style recommendations over other alternatives in a real-world perception study.

RONov 18, 2019
A gamified simulator and physical platform for self-driving algorithm training and validation

Joshua E. Siegel, Georgios Pappas, Konstantinos Politopoulos et al.

We identify the need for a gamified self-driving simulator where game mechanics encourage high-quality data capture, and design and apply such a simulator to collecting lane-following training data. The resulting synthetic data enables a Convolutional Neural Network (CNN) to drive an in-game vehicle. We simultaneously develop a physical test platform based on a radio-controlled vehicle and the Robotic Operating System (ROS) and successfully transfer the simulation-trained model to the physical domain without modification. The cross-platform simulator facilitates unsupervised crowdsourcing, helping to collect diverse data emulating complex, dynamic environment data, infrequent events, and edge cases. The physical platform provides a low-cost solution for validating simulation-trained models or enabling rapid transfer learning, thereby improving the safety and resilience of self-driving algorithms.

CVOct 12, 2018
PointGrow: Autoregressively Learned Point Cloud Generation with Self-Attention

Yongbin Sun, Yue Wang, Ziwei Liu et al.

Generating 3D point clouds is challenging yet highly desired. This work presents a novel autoregressive model, PointGrow, which can generate diverse and realistic point cloud samples from scratch or conditioned on semantic contexts. This model operates recurrently, with each point sampled according to a conditional distribution given its previously-generated points, allowing inter-point correlations to be well-exploited and 3D shape generative processes to be better interpreted. Since point cloud object shapes are typically encoded by long-range dependencies, we augment our model with dedicated self-attention modules to capture such relations. Extensive evaluations show that PointGrow achieves satisfying performance on both unconditional and conditional point cloud generation tasks, with respect to realism and diversity. Several important applications, such as unsupervised feature learning and shape arithmetic operations, are also demonstrated.

HCJun 2, 2018
X-Vision: An augmented vision tool with real-time sensing ability in tagged environments

Yongbin Sun, Sai Nithin R. Kantareddy, Rahul Bhattacharyya et al.

We present the concept of X-Vision, an enhanced Augmented Reality (AR)-based visualization tool, with the real-time sensing capability in a tagged environment. We envision that this type of a tool will enhance the user-environment interaction and improve the productivity in factories, smart-spaces, home & office environments, maintenance/facility rooms and operation theatres, etc. In this paper, we describe the design of this visualization system built upon combining the object's pose information estimated by the depth camera and the object's ID & physical attributes captured by the RFID tags. We built a physical prototype of the system demonstrating the projection of 3D holograms of the objects encoded with sensed information like water-level and temperature of common office/household objects. The paper also discusses the quality metrics used to compare the pose estimation algorithms for robust reconstruction of the object's 3D data.

CVApr 17, 2018
Im2Avatar: Colorful 3D Reconstruction from a Single Image

Yongbin Sun, Ziwei Liu, Yue Wang et al.

Existing works on single-image 3D reconstruction mainly focus on shape recovery. In this work, we study a new problem, that is, simultaneously recovering 3D shape and surface color from a single image, namely "colorful 3D reconstruction". This problem is both challenging and intriguing because the ability to infer textured 3D model from a single image is at the core of visual understanding. Here, we propose an end-to-end trainable framework, Colorful Voxel Network (CVN), to tackle this problem. Conditioned on a single 2D input, CVN learns to decompose shape and surface color information of a 3D object into a 3D shape branch and a surface color branch, respectively. Specifically, for the shape recovery, we generate a shape volume with the state of its voxels indicating occupancy. For the surface color recovery, we combine the strength of appearance hallucination and geometric projection by concurrently learning a regressed color volume and a 2D-to-3D flow volume, which are then fused into a blended color volume. The final textured 3D model is obtained by sampling color from the blended color volume at the positions of occupied voxels in the shape volume. To handle the severe sparse volume representations, a novel loss function, Mean Squared False Cross-Entropy Loss (MSFCEL), is designed. Extensive experiments demonstrate that our approach achieves significant improvement over baselines, and shows great generalization across diverse object categories and arbitrary viewpoints.

HCApr 11, 2018
Visualization and Labeling of Point Clouds in Virtual Reality

Jonathan Dyssel Stets, Yongbin Sun, Wiley Corning et al.

We present a Virtual Reality (VR) application for labeling and handling point cloud data sets. A series of room-scale point clouds are recorded as a video sequence using a Microsoft Kinect. The data can be played and paused, and frames can be skipped just like in a video player. The user can walk around and inspect the data while it is playing or paused. Using the tracked hand-held controller, the user can select and label individual parts of the point cloud. The points are highlighted with a color when they are labeled. With a tracking algorithm, the labeled points can be tracked from frame to frame to ease the labeling process. Our sample data is an RGB point cloud recording of two people juggling with pins. Here, the user can select and label, for example, the juggler pins as shown in Figure 1. Each juggler pin is labeled with various colors to indicate di erent labels.

CVJan 24, 2018
Dynamic Graph CNN for Learning on Point Clouds

Yue Wang, Yongbin Sun, Ziwei Liu et al.

Point clouds provide a flexible geometric representation suitable for countless applications in computer graphics; they also comprise the raw output of most 3D data acquisition devices. While hand-designed features on point clouds have long been proposed in graphics and vision, however, the recent overwhelming success of convolutional neural networks (CNNs) for image analysis suggests the value of adapting insight from CNN to the point cloud world. Point clouds inherently lack topological information so designing a model to recover topology can enrich the representation power of point clouds. To this end, we propose a new neural network module dubbed EdgeConv suitable for CNN-based high-level tasks on point clouds including classification and segmentation. EdgeConv acts on graphs dynamically computed in each layer of the network. It is differentiable and can be plugged into existing architectures. Compared to existing modules operating in extrinsic space or treating each point independently, EdgeConv has several appealing properties: It incorporates local neighborhood information; it can be stacked applied to learn global shape properties; and in multi-layer systems affinity in feature space captures semantic characteristics over potentially long distances in the original embedding. We show the performance of our model on standard benchmarks including ModelNet40, ShapeNetPart, and S3DIS.