Ye Zhu

h-index15

4papers

4citations

Novelty50%

AI Score31

Ranked #129,611 of 194,257 authors (top 67%)#42,865 in CV (top 72%)

4 Papers

5.2CVDec 26, 2024Code

DAPoinTr: Domain Adaptive Point Transformer for Point Cloud Completion

Yinghui Li, Qianyu Zhou, Jingyu Gong et al.

Point Transformers (PoinTr) have shown great potential in point cloud completion recently. Nevertheless, effective domain adaptation that improves transferability toward target domains remains unexplored. In this paper, we delve into this topic and empirically discover that direct feature alignment on point Transformer's CNN backbone only brings limited improvements since it cannot guarantee sequence-wise domain-invariant features in the Transformer. To this end, we propose a pioneering Domain Adaptive Point Transformer (DAPoinTr) framework for point cloud completion. DAPoinTr consists of three key components: Domain Query-based Feature Alignment (DQFA), Point Token-wise Feature alignment (PTFA), and Voted Prediction Consistency (VPC). In particular, DQFA is presented to narrow the global domain gaps from the sequence via the presented domain proxy and domain query at the Transformer encoder and decoder, respectively. PTFA is proposed to close the local domain shifts by aligning the tokens, \emph{i.e.,} point proxy and dynamic query, at the Transformer encoder and decoder, respectively. VPC is designed to consider different Transformer decoders as multiple of experts (MoE) for ensembled prediction voting and pseudo-label generation. Extensive experiments with visualization on several domain adaptation benchmarks demonstrate the effectiveness and superiority of our DAPoinTr compared with state-of-the-art methods. Code will be publicly available at: https://github.com/Yinghui-Li-New/DAPoinTr

3.8CRJul 24, 2021

Breath to Pair (B2P): Respiration-Based Pairing Protocol for Wearable Devices

Jafar Pourbemany, Ye Zhu, Riccardo Bettati

We propose Breath to Pair (B2P), a protocol for pairing and shared-key generation for wearable devices that leverages the wearer's respiration activity to ensure that the devices are part of the same body-area network. We assume that the devices exploit different types of sensors to extract and process the respiration signal. We illustrate B2P for the case of two devices that use respiratory inductance plethysmography (RIP) and accelerometer sensors, respectively. Allowing for different types of sensors in pairing allows us to include wearable devices that use a variety of different sensors. In practice, this form of sensor variety creates a number of challenges that limit the ability of the shared-key establishment algorithm to generate matching keys. The two main obstacles are the lack of synchronization across the devices and the need for correct noise-induced mismatches between the generated key bit-strings. B2P addresses the synchronization challenge by utilizing Change Point Detection (CPD) to detect abrupt changes in the respiration signal and consider their occurrences as synchronizing points. Any potential mismatches are handled by optimal quantization and encoding of the respiration signal in order to maximize the error correction rate and minimize the message overheads. Extensive evaluation on a dataset collected from 30 volunteers demonstrates that our protocol can generate a secure 256-bit key every 2.85 seconds (around one breathing cycle). Particular attention is given to secure B2P against device impersonation attacks.

1.2LGFeb 21, 2020

Leveraging Cross Feedback of User and Item Embeddings with Attention for Variational Autoencoder based Collaborative Filtering

Yuan Jin, He Zhao, Ming Liu et al.

Matrix factorization (MF) has been widely applied to collaborative filtering in recommendation systems. Its Bayesian variants can derive posterior distributions of user and item embeddings, and are more robust to sparse ratings. However, the Bayesian methods are restricted by their update rules for the posterior parameters due to the conjugacy of the priors and the likelihood. Variational autoencoders (VAE) can address this issue by capturing complex mappings between the posterior parameters and the data. However, current research on VAEs for collaborative filtering only considers the mappings based on the explicit data information while the implicit embedding information is overlooked. In this paper, we first derive evidence lower bounds (ELBO) for Bayesian MF models from two viewpoints: user-oriented and item-oriented. Based on the ELBOs, we propose a VAE-based Bayesian MF framework. It leverages not only the data but also the embedding information to approximate the user-item joint distribution. As suggested by the ELBOs, the approximation is iterative with cross feedback of user and item embeddings into each other's encoders. More specifically, user embeddings sampled at the previous iteration are fed to the item-side encoders to estimate the posterior parameters for the item embeddings at the current iteration, and vice versa. The estimation also attends to the cross-fed embeddings to further exploit useful information. The decoder then reconstructs the data via the matrix factorization over the currently re-sampled user and item embeddings.

0.9CVNov 30, 2018

Multiview Based 3D Scene Understanding On Partial Point Sets

Ye Zhu, Sven Ewan Shepstone, Pablo Martínez-Nuevo et al.

Deep learning within the context of point clouds has gained much research interest in recent years mostly due to the promising results that have been achieved on a number of challenging benchmarks, such as 3D shape recognition and scene semantic segmentation. In many realistic settings however, snapshots of the environment are often taken from a single view, which only contains a partial set of the scene due to the field of view restriction of commodity cameras. 3D scene semantic understanding on partial point clouds is considered as a challenging task. In this work, we propose a processing approach for 3D point cloud data based on a multiview representation of the existing 360° point clouds. By fusing the original 360° point clouds and their corresponding 3D multiview representations as input data, a neural network is able to recognize partial point sets while improving the general performance on complete point sets, resulting in an overall increase of 31.9% and 4.3% in segmentation accuracy for partial and complete scene semantic understanding, respectively. This method can also be applied in a wider 3D recognition context such as 3D part segmentation.