Dandan Ding

CV
h-index15
9papers
514citations
Novelty59%
AI Score50

9 Papers

CVJan 28, 2023
Dynamic Point Cloud Geometry Compression Using Multiscale Inter Conditional Coding

Jianqiang Wang, Dandan Ding, Hao Chen et al.

This work extends the Multiscale Sparse Representation (MSR) framework developed for static Point Cloud Geometry Compression (PCGC) to support the dynamic PCGC through the use of multiscale inter conditional coding. To this end, the reconstruction of the preceding Point Cloud Geometry (PCG) frame is progressively downscaled to generate multiscale temporal priors which are then scale-wise transferred and integrated with lower-scale spatial priors from the same frame to form the contextual information to improve occupancy probability approximation when processing the current PCG frame from one scale to another. Following the Common Test Conditions (CTC) defined in the standardization committee, the proposed method presents State-Of-The-Art (SOTA) compression performance, yielding 78% lossy BD-Rate gain to the latest standard-compliant V-PCC and 45% lossless bitrate reduction to the latest G-PCC. Even for recently-emerged learning-based solutions, our method still shows significant performance gains.

CVSep 17, 2022
CARNet:Compression Artifact Reduction for Point Cloud Attribute

Dandan Ding, Junzhe Zhang, Jianqiang Wang et al.

A learning-based adaptive loop filter is developed for the Geometry-based Point Cloud Compression (G-PCC) standard to reduce attribute compression artifacts. The proposed method first generates multiple Most-Probable Sample Offsets (MPSOs) as potential compression distortion approximations, and then linearly weights them for artifact mitigation. As such, we drive the filtered reconstruction as close to the uncompressed PCA as possible. To this end, we devise a Compression Artifact Reduction Network (CARNet) which consists of two consecutive processing phases: MPSOs derivation and MPSOs combination. The MPSOs derivation uses a two-stream network to model local neighborhood variations from direct spatial embedding and frequency-dependent embedding, where sparse convolutions are utilized to best aggregate information from sparsely and irregularly distributed points. The MPSOs combination is guided by the least square error metric to derive weighting coefficients on the fly to further capture content dynamics of input PCAs. The CARNet is implemented as an in-loop filtering tool of the GPCC, where those linear weighting coefficients are encapsulated into the bitstream with negligible bit rate overhead. Experimental results demonstrate significant improvement over the latest GPCC both subjectively and objectively.

55.8CVMay 2
PACE: Post-Causal Entropy Modeling for Learned LiDAR Point Cloud Compression

Jiahao Zhu, Kang You, Dandan Ding et al.

LiDAR point cloud compression is vital for autonomous systems to handle massive data from high-resolution sensors. While learned entropy modeling built upon octree structures yields high compression gains, it faces two critical bottlenecks: 1) prohibitive latency, particularly during decoding, caused by causal, multi-stage context modeling; and 2) a rigid performance-latency trade-off, preventing a single model from adapting to varying constraints. These limitations stem from the tight coupling between context aggregation backbone and probability prediction. To address this, we propose PACE, a new framework that reformulates ancestral context aggregation as a non-causal backbone and confines causality to a lightweight, stage-scalable predictor, eliminating repetitive backbone executions and reducing computational overhead. The predictor supports an arbitrary number of prediction stages, supporting seamless adaptation across diverse performance-latency trade-offs without reloading parameters. Experiments demonstrate that PACE sets a new state-of-the-art in compression efficiency, achieving notable BD-BR savings and reducing decoding latency by over 90% in autoregressive mode, highly attractive for practical applications.

CVMar 16, 2025Code
RENO: Real-Time Neural Compression for 3D LiDAR Point Clouds

Kang You, Tong Chen, Dandan Ding et al.

Despite the substantial advancements demonstrated by learning-based neural models in the LiDAR Point Cloud Compression (LPCC) task, realizing real-time compression - an indispensable criterion for numerous industrial applications - remains a formidable challenge. This paper proposes RENO, the first real-time neural codec for 3D LiDAR point clouds, achieving superior performance with a lightweight model. RENO skips the octree construction and directly builds upon the multiscale sparse tensor representation. Instead of the multi-stage inferring, RENO devises sparse occupancy codes, which exploit cross-scale correlation and derive voxels' occupancy in a one-shot manner, greatly saving processing time. Experimental results demonstrate that the proposed RENO achieves real-time coding speed, 10 fps at 14-bit depth on a desktop platform (e.g., one RTX 3090 GPU) for both encoding and decoding processes, while providing 12.25% and 48.34% bit-rate savings compared to G-PCCv23 and Draco, respectively, at a similar quality. RENO model size is merely 1MB, making it attractive for practical applications. The source code is available at https://github.com/NJUVISION/RENO.

CVApr 21, 2024
Pointsoup: High-Performance and Extremely Low-Decoding-Latency Learned Geometry Codec for Large-Scale Point Cloud Scenes

Kang You, Kai Liu, Li Yu et al.

Despite considerable progress being achieved in point cloud geometry compression, there still remains a challenge in effectively compressing large-scale scenes with sparse surfaces. Another key challenge lies in reducing decoding latency, a crucial requirement in real-world application. In this paper, we propose Pointsoup, an efficient learning-based geometry codec that attains high-performance and extremely low-decoding-latency simultaneously. Inspired by conventional Trisoup codec, a point model-based strategy is devised to characterize local surfaces. Specifically, skin features are embedded from local windows via an attention-based encoder, and dilated windows are introduced as cross-scale priors to infer the distribution of quantized features in parallel. During decoding, features undergo fast refinement, followed by a folding-based point generator that reconstructs point coordinates with fairly fast speed. Experiments show that Pointsoup achieves state-of-the-art performance on multiple benchmarks with significantly lower decoding complexity, i.e., up to 90$\sim$160$\times$ faster than the G-PCCv23 Trisoup decoder on a comparatively low-end platform (e.g., one RTX 2080Ti). Furthermore, it offers variable-rate control with a single neural model (2.9MB), which is attractive for industrial practitioners.

CVMay 14, 2025
Efficient LiDAR Reflectance Compression via Scanning Serialization

Jiahao Zhu, Kang You, Dandan Ding et al.

Reflectance attributes in LiDAR point clouds provide essential information for downstream tasks but remain underexplored in neural compression methods. To address this, we introduce SerLiC, a serialization-based neural compression framework to fully exploit the intrinsic characteristics of LiDAR reflectance. SerLiC first transforms 3D LiDAR point clouds into 1D sequences via scan-order serialization, offering a device-centric perspective for reflectance analysis. Each point is then tokenized into a contextual representation comprising its sensor scanning index, radial distance, and prior reflectance, for effective dependencies exploration. For efficient sequential modeling, Mamba is incorporated with a dual parallelization scheme, enabling simultaneous autoregressive dependency capture and fast processing. Extensive experiments demonstrate that SerLiC attains over 2x volume reduction against the original reflectance data, outperforming the state-of-the-art method by up to 22% reduction of compressed bits while using only 2% of its parameters. Moreover, a lightweight version of SerLiC achieves > 10 fps (frames per second) with just 111K parameters, which is attractive for real-world applications.

CVNov 20, 2021
Sparse Tensor-based Multiscale Representation for Point Cloud Geometry Compression

Jianqiang Wang, Dandan Ding, Zhu Li et al.

This study develops a unified Point Cloud Geometry (PCG) compression method through the processing of multiscale sparse tensor-based voxelized PCG. We call this compression method SparsePCGC. The proposed SparsePCGC is a low complexity solution because it only performs the convolutions on sparsely-distributed Most-Probable Positively-Occupied Voxels (MP-POV). The multiscale representation also allows us to compress scale-wise MP-POVs by exploiting cross-scale and same-scale correlations extensively and flexibly. The overall compression efficiency highly depends on the accuracy of estimated occupancy probability for each MP-POV. Thus, we first design the Sparse Convolution-based Neural Network (SparseCNN) which stacks sparse convolutions and voxel sampling to best characterize and embed spatial correlations. We then develop the SparseCNN-based Occupancy Probability Approximation (SOPA) model to estimate the occupancy probability either in a single-stage manner only using the cross-scale correlation, or in a multi-stage manner by exploiting stage-wise correlation among same-scale neighbors. Besides, we also suggest the SparseCNN based Local Neighborhood Embedding (SLNE) to aggregate local variations as spatial priors in feature attribute to improve the SOPA. Our unified approach not only shows state-of-the-art performance in both lossless and lossy compression modes across a variety of datasets including the dense object PCGs (8iVFB, Owlii, MUVB) and sparse LiDAR PCGs (KITTI, Ford) when compared with standardized MPEG G-PCC and other prevalent learning-based schemes, but also has low complexity which is attractive to practical applications.

CVDec 1, 2020
Decomposition, Compression, and Synthesis (DCS)-based Video Coding: A Neural Exploration via Resolution-Adaptive Learning

Ming Lu, Tong Chen, Dandan Ding et al.

Inspired by the facts that retinal cells actually segregate the visual scene into different attributes (e.g., spatial details, temporal motion) for respective neuronal processing, we propose to first decompose the input video into respective spatial texture frames (STF) at its native spatial resolution that preserve the rich spatial details, and the other temporal motion frames (TMF) at a lower spatial resolution that retain the motion smoothness; then compress them together using any popular video coder; and finally synthesize decoded STFs and TMFs for high-fidelity video reconstruction at the same resolution as its native input. This work simply applies the bicubic resampling in decomposition and HEVC compliant codec in compression, and puts the focus on the synthesis part. For resolution-adaptive synthesis, a motion compensation network (MCN) is devised on TMFs to efficiently align and aggregate temporal motion features that will be jointly processed with corresponding STFs using a non-local texture transfer network (NL-TTN) to better augment spatial details, by which the compression and resolution resampling noises can be effectively alleviated with better rate-distortion efficiency. Such "Decomposition, Compression, Synthesis (DCS)" based scheme is codec agnostic, currently exemplifying averaged $\approx$1 dB PSNR gain or $\approx$25% BD-rate saving, against the HEVC anchor using reference software. In addition, experimental comparisons to the state-of-the-art methods and ablation studies are conducted to further report the efficiency and generalization of DCS algorithm, promising an encouraging direction for future video coding.

IVNov 7, 2020
Multiscale Point Cloud Geometry Compression

Jianqiang Wang, Dandan Ding, Zhu Li et al.

Recent years have witnessed the growth of point cloud based applications because of its realistic and fine-grained representation of 3D objects and scenes. However, it is a challenging problem to compress sparse, unstructured, and high-precision 3D points for efficient communication. In this paper, leveraging the sparsity nature of point cloud, we propose a multiscale end-to-end learning framework which hierarchically reconstructs the 3D Point Cloud Geometry (PCG) via progressive re-sampling. The framework is developed on top of a sparse convolution based autoencoder for point cloud compression and reconstruction. For the input PCG which has only the binary occupancy attribute, our framework translates it to a downscaled point cloud at the bottleneck layer which possesses both geometry and associated feature attributes. Then, the geometric occupancy is losslessly compressed using an octree codec and the feature attributes are lossy compressed using a learned probabilistic context model.Compared to state-of-the-art Video-based Point Cloud Compression (V-PCC) and Geometry-based PCC (G-PCC) schemes standardized by the Moving Picture Experts Group (MPEG), our method achieves more than 40% and 70% BD-Rate (Bjontegaard Delta Rate) reduction, respectively. Its encoding runtime is comparable to that of G-PCC, which is only 1.5% of V-PCC.