Haoyu Li

h-index31

3papers

41citations

Novelty65%

AI Score32

Ranked #126,413 of 194,257 authors (top 65%)#3,153 in CR (top 47%)

3 Papers

2.2RODec 11, 2024

3DTTNet: Multimodal Fusion-Based 3D Traversable Terrain Modeling for Off-Road Environments

Zitong Chen, Chao Sun, Shida Nie et al.

Off-road environments remain significant challenges for autonomous ground vehicles, due to the lack of structured roads and the presence of complex obstacles, such as uneven terrain, vegetation, and occlusions. Traditional perception algorithms, primarily designed for structured environments, often fail in unstructured scenarios. In this paper, traversable area recognition is achieved through semantic scene completion. A novel multimodal method, 3DTTNet, is proposed to generate dense traversable terrain estimations by integrating LiDAR point clouds with monocular images from a forward-facing perspective. By integrating multimodal data, environmental feature extraction is strengthened, which is crucial for accurate terrain modeling in complex terrains. Furthermore, RELLIS-OCC, a dataset with 3D traversable annotations, is introduced, incorporating geometric features such as step height, slope, and unevenness. Through a comprehensive analysis of vehicle obsta cle-crossing conditions and the incorporation of vehicle body structure constraints, four traversability cost labels are generated: lethal, medium-cost, low-cost, and free. Experimental results demonstrate that 3DTTNet outperforms the comparison approaches in 3D traversable area recognition, particularly in off-road environments with irregular geometries and partial occlusions. Specifically, 3DTTNet achieves a 42\% improvement in scene completion IoU compared to other models. The proposed framework is scalable and adaptable to various vehicle platforms, allowing for adjustments to occupancy grid parameters and the integration of advanced dynamic models for traversability cost estimation.

10.8ASMay 16, 2020

Improved Prosody from Learned F0 Codebook Representations for VQ-VAE Speech Waveform Reconstruction

Yi Zhao, Haoyu Li, Cheng-I Lai et al.

Vector Quantized Variational AutoEncoders (VQ-VAE) are a powerful representation learning framework that can discover discrete groups of features from a speech signal without supervision. Until now, the VQ-VAE architecture has previously modeled individual types of speech features, such as only phones or only F0. This paper introduces an important extension to VQ-VAE for learning F0-related suprasegmental information simultaneously along with traditional phone features.The proposed framework uses two encoders such that the F0 trajectory and speech waveform are both input to the system, therefore two separate codebooks are learned. We used a WaveRNN vocoder as the decoder component of VQ-VAE. Our speaker-independent VQ-VAE was trained with raw speech waveforms from multi-speaker Japanese speech databases. Experimental results show that the proposed extension reduces F0 distortion of reconstructed speech for all unseen test speakers, and results in significantly higher preference scores from a listening test. We additionally conducted experiments using single-speaker Mandarin speech to demonstrate advantages of our architecture in another language which relies heavily on F0.

7.5CROct 26, 2016

Volumetric Light-field Encryption at the Microscopic Scale

Haoyu Li, Changliang Guo, Inbarasan Muniraj et al.

We report a light-field based method that allows the optical encryption of three-dimensional (3D) volumetric information at the microscopic scale in a single 2D light-field image. The system consists of a microlens array and an array of random phase/amplitude masks. The method utilizes a wave optics model to account for the dominant diffraction effect at this new scale, and the system point-spread function (PSF) serves as the key for encryption and decryption. We successfully developed and demonstrated a deconvolution algorithm to retrieve spatially multiplexed discrete and continuous volumetric data from 2D light-field images. Showing that the method is practical for data transmission and storage, we obtained a faithful reconstruction of the 3D volumetric information from a digital copy of the encrypted light-field image. The method represents a new level of optical encryption, paving the way for broad industrial and biomedical applications in processing and securing 3D data at the microscopic scale.