PVINet: Point-Voxel Interlaced Network for Point Cloud Compression
This work addresses point cloud compression for applications like 3D graphics and autonomous driving, but it is incremental as it builds on existing methods by improving feature interaction.
The paper tackles the problem of point cloud compression by proposing PVINet, which captures global structural and local contextual features in parallel with interactions at each scale, achieving competitive performance compared to state-of-the-art methods on benchmark datasets.
In point cloud compression, the quality of a reconstructed point cloud relies on both the global structure and the local context, with existing methods usually processing global and local information sequentially and lacking communication between these two types of information. In this paper, we propose a point-voxel interlaced network (PVINet), which captures global structural features and local contextual features in parallel and performs interactions at each scale to enhance feature perception efficiency. Specifically, PVINet contains a voxel-based encoder (Ev) for extracting global structural features and a point-based encoder (Ep) that models local contexts centered at each voxel. Particularly, a novel conditional sparse convolution is introduced, which applies point embeddings to dynamically customize kernels for voxel feature extraction, facilitating feature interactions from Ep to Ev. During decoding, a voxel-based decoder employs conditional sparse convolutions to incorporate point embeddings as guidance to reconstruct the point cloud. Experiments on benchmark datasets show that PVINet delivers competitive performance compared to state-of-the-art methods.