Giuseppe Valenzise

CV
h-index14
19papers
885citations
Novelty52%
AI Score40

19 Papers

SPJul 10, 2024
Generative AI for RF Sensing in IoT systems

Li Wang, Chao Zhang, Qiyang Zhao et al.

The development of wireless sensing technologies, using signals such as Wi-Fi, infrared, and RF to gather environmental data, has significantly advanced within Internet of Things (IoT) systems. Among these, Radio Frequency (RF) sensing stands out for its cost-effective and non-intrusive monitoring of human activities and environmental changes. However, traditional RF sensing methods face significant challenges, including noise, interference, incomplete data, and high deployment costs, which limit their effectiveness and scalability. This paper investigates the potential of Generative AI (GenAI) to overcome these limitations within the IoT ecosystem. We provide a comprehensive review of state-of-the-art GenAI techniques, focusing on their application to RF sensing problems. By generating high-quality synthetic data, enhancing signal quality, and integrating multi-modal data, GenAI offers robust solutions for RF environment reconstruction, localization, and imaging. Additionally, GenAI's ability to generalize enables IoT devices to adapt to new environments and unseen tasks, improving their efficiency and performance. The main contributions of this article include a detailed analysis of the challenges in RF sensing, the presentation of innovative GenAI-based solutions, and the proposal of a unified framework for diverse RF sensing tasks. Through case studies, we demonstrate the effectiveness of integrating GenAI models, leading to advanced, scalable, and intelligent IoT systems.

CVNov 4, 2022
PCQA-GRAPHPOINT: Efficients Deep-Based Graph Metric For Point Cloud Quality Assessment

Marouane Tliba, Aladine Chetouani, Giuseppe Valenzise et al.

Following the advent of immersive technologies and the increasing interest in representing interactive geometrical format, 3D Point Clouds (PC) have emerged as a promising solution and effective means to display 3D visual information. In addition to other challenges in immersive applications, objective and subjective quality assessments of compressed 3D content remain open problems and an area of research interest. Yet most of the efforts in the research area ignore the local geometrical structures between points representation. In this paper, we overcome this limitation by introducing a novel and efficient objective metric for Point Clouds Quality Assessment, by learning local intrinsic dependencies using Graph Neural Network (GNN). To evaluate the performance of our method, two well-known datasets have been used. The results demonstrate the effectiveness and reliability of our solution compared to state-of-the-art metrics.

CVJul 9, 2023
Predictive Coding For Animation-Based Video Compression

Goluck Konuko, Stéphane Lathuilière, Giuseppe Valenzise

We address the problem of efficiently compressing video for conferencing-type applications. We build on recent approaches based on image animation, which can achieve good reconstruction quality at very low bitrate by representing face motions with a compact set of sparse keypoints. However, these methods encode video in a frame-by-frame fashion, i.e. each frame is reconstructed from a reference frame, which limits the reconstruction quality when the bandwidth is larger. Instead, we propose a predictive coding scheme which uses image animation as a predictor, and codes the residual with respect to the actual target frame. The residuals can be in turn coded in a predictive manner, thus removing efficiently temporal dependencies. Our experiments indicate a significant bitrate gain, in excess of 70% compared to the HEVC video standard and over 30% compared to VVC, on a datasetof talking-head videos

CVMar 15, 2023
Quality evaluation of point clouds: a novel no-reference approach using transformer-based architecture

Marouane Tliba, Aladine Chetouani, Giuseppe Valenzise et al.

With the increased interest in immersive experiences, point cloud came to birth and was widely adopted as the first choice to represent 3D media. Besides several distortions that could affect the 3D content spanning from acquisition to rendering, efficient transmission of such volumetric content over traditional communication systems stands at the expense of the delivered perceptual quality. To estimate the magnitude of such degradation, employing quality metrics became an inevitable solution. In this work, we propose a novel deep-based no-reference quality metric that operates directly on the whole point cloud without requiring extensive pre-processing, enabling real-time evaluation over both transmission and rendering levels. To do so, we use a novel model design consisting primarily of cross and self-attention layers, in order to learn the best set of local semantic affinities while keeping the best combination of geometry and color information in multiple levels from basic features extraction to deep representation modeling.

CVMay 19, 2025Code
Denoising Diffusion Probabilistic Model for Point Cloud Compression at Low Bit-Rates

Gabriele Spadaro, Alberto Presta, Jhony H. Giraldo et al.

Efficient compression of low-bit-rate point clouds is critical for bandwidth-constrained applications. However, existing techniques mainly focus on high-fidelity reconstruction, requiring many bits for compression. This paper proposes a "Denoising Diffusion Probabilistic Model" (DDPM) architecture for point cloud compression (DDPM-PCC) at low bit-rates. A PointNet encoder produces the condition vector for the generation, which is then quantized via a learnable vector quantizer. This configuration allows to achieve a low bitrates while preserving quality. Experiments on ShapeNet and ModelNet40 show improved rate-distortion at low rates compared to standardized and state-of-the-art approaches. We publicly released the code at https://github.com/EIDOSLAB/DDPM-PCC.

IVApr 20, 2021Code
Multiscale deep context modeling for lossless point cloud geometry compression

Dat Thanh Nguyen, Maurice Quach, Giuseppe Valenzise et al.

We propose a practical deep generative approach for lossless point cloud geometry compression, called MSVoxelDNN, and show that it significantly reduces the rate compared to the MPEG G-PCC codec. Our previous work based on autoregressive models (VoxelDNN) has a fast training phase, however, inference is slow as the occupancy probabilities are predicted sequentially, voxel by voxel. In this work, we employ a multiscale architecture which models voxel occupancy in coarse-to-fine order. At each scale, MSVoxelDNN divides voxels into eight conditionally independent groups, thus requiring a single network evaluation per group instead of one per voxel. We evaluate the performance of MSVoxelDNN on a set of point clouds from Microsoft Voxelized Upper Bodies (MVUB) and MPEG, showing that the current method speeds up encoding/decoding times significantly compared to the previous VoxelDNN, while having average rate saving over G-PCC of 17.5%. The implementation is available at https://github.com/Weafre/MSVoxelDNN.

CVFeb 25, 2021Code
A deep perceptual metric for 3D point clouds

Maurice Quach, Aladine Chetouani, Giuseppe Valenzise et al.

Point clouds are essential for storage and transmission of 3D content. As they can entail significant volumes of data, point cloud compression is crucial for practical usage. Recently, point cloud geometry compression approaches based on deep neural networks have been explored. In this paper, we evaluate the ability to predict perceptual quality of typical voxel-based loss functions employed to train these networks. We find that the commonly used focal loss and weighted binary cross entropy are poorly correlated with human perception. We thus propose a perceptual loss function for 3D point clouds which outperforms existing loss functions on the ICIP2020 subjective dataset. In addition, we propose a novel truncated distance field voxel grid representation and find that it leads to sparser latent spaces and loss functions that are more correlated with perceived visual quality compared to a binary representation. The source code is available at https://github.com/mauriceqch/2021_pc_perceptual_loss.

IVNov 30, 2020Code
Learning-based lossless compression of 3D point cloud geometry

Dat Thanh Nguyen, Maurice Quach, Giuseppe Valenzise et al.

This paper presents a learning-based, lossless compression method for static point cloud geometry, based on context-adaptive arithmetic coding. Unlike most existing methods working in the octree domain, our encoder operates in a hybrid mode, mixing octree and voxel-based coding. We adaptively partition the point cloud into multi-resolution voxel blocks according to the point cloud structure, and use octree to signal the partitioning. On the one hand, octree representation can eliminate the sparsity in the point cloud. On the other hand, in the voxel domain, convolutions can be naturally expressed, and geometric information (i.e., planes, surfaces, etc.) is explicitly processed by a neural network. Our context model benefits from these properties and learns a probability distribution of the voxels using a deep convolutional neural network with masked filters, called VoxelDNN. Experiments show that our method outperforms the state-of-the-art MPEG G-PCC standard with average rate savings of 28% on a diverse set of point clouds from the Microsoft Voxelized Upper Bodies (MVUB) and MPEG. The implementation is available at https://github.com/Weafre/VoxelDNN.

CVJun 16, 2020Code
Improved Deep Point Cloud Geometry Compression

Maurice Quach, Giuseppe Valenzise, Frederic Dufaux

Point clouds have been recognized as a crucial data structure for 3D content and are essential in a number of applications such as virtual and mixed reality, autonomous driving, cultural heritage, etc. In this paper, we propose a set of contributions to improve deep point cloud compression, i.e.: using a scale hyperprior model for entropy coding; employing deeper transforms; a different balancing weight in the focal loss; optimal thresholding for decoding; and sequential model training. In addition, we present an extensive ablation study on the impact of each of these factors, in order to provide a better understanding about why they improve RD performance. An optimal combination of the proposed improvements achieves BD-PSNR gains over G-PCC trisoup and octree of 5.50 (6.48) dB and 6.84 (5.95) dB, respectively, when using the point-to-point (point-to-plane) metric. Code is available at https://github.com/mauriceqch/pcc_geo_cnn_v2 .

CVMar 20, 2019Code
Learning Convolutional Transforms for Lossy Point Cloud Geometry Compression

Maurice Quach, Giuseppe Valenzise, Frederic Dufaux

Efficient point cloud compression is fundamental to enable the deployment of virtual and mixed reality applications, since the number of points to code can range in the order of millions. In this paper, we present a novel data-driven geometry compression method for static point clouds based on learned convolutional transforms and uniform quantization. We perform joint optimization of both rate and distortion using a trade-off parameter. In addition, we cast the decoding process as a binary classification of the point cloud occupancy map. Our method outperforms the MPEG reference solution in terms of rate-distortion on the Microsoft Voxelized Upper Bodies dataset with 51.5% BDBR savings on average. Moreover, while octree-based methods face exponential diminution of the number of points at low bitrates, our method still produces high resolution outputs even at low bitrates. Code and supplementary material are available at https://github.com/mauriceqch/pcc_geo_cnn .

CVJun 9, 2025
Generative Models at the Frontier of Compression: A Survey on Generative Face Video Coding

Bolin Chen, Shanzhi Yin, Goluck Konuko et al.

The rise of deep generative models has greatly advanced video compression, reshaping the paradigm of face video coding through their powerful capability for semantic-aware representation and lifelike synthesis. Generative Face Video Coding (GFVC) stands at the forefront of this revolution, which could characterize complex facial dynamics into compact latent codes for bitstream compactness at the encoder side and leverages powerful deep generative models to reconstruct high-fidelity face signal from the compressed latent codes at the decoder side. As such, this well-designed GFVC paradigm could enable high-fidelity face video communication at ultra-low bitrate ranges, far surpassing the capabilities of the latest Versatile Video Coding (VVC) standard. To pioneer foundational research and accelerate the evolution of GFVC, this paper presents the first comprehensive survey of GFVC technologies, systematically bridging critical gaps between theoretical innovation and industrial standardization. In particular, we first review a broad range of existing GFVC methods with different feature representations and optimization strategies, and conduct a thorough benchmarking analysis. In addition, we construct a large-scale GFVC-compressed face video database with subjective Mean Opinion Scores (MOSs) based on human perception, aiming to identify the most appropriate quality metrics tailored to GFVC. Moreover, we summarize the GFVC standardization potentials with a unified high-level syntax and develop a low-complexity GFVC system which are both expected to push forward future practical deployments and applications. Finally, we envision the potential of GFVC in industrial applications and deliberate on the current challenges and future opportunities.

CVMar 4, 2025
Language-Guided Visual Perception Disentanglement for Image Quality Assessment and Conditional Image Generation

Zhichao Yang, Leida Li, Pengfei Chen et al.

Contrastive vision-language models, such as CLIP, have demonstrated excellent zero-shot capability across semantic recognition tasks, mainly attributed to the training on a large-scale I&1T (one Image with one Text) dataset. This kind of multimodal representations often blend semantic and perceptual elements, placing a particular emphasis on semantics. However, this could be problematic for popular tasks like image quality assessment (IQA) and conditional image generation (CIG), which typically need to have fine control on perceptual and semantic features. Motivated by the above facts, this paper presents a new multimodal disentangled representation learning framework, which leverages disentangled text to guide image disentanglement. To this end, we first build an I&2T (one Image with a perceptual Text and a semantic Text) dataset, which consists of disentangled perceptual and semantic text descriptions for an image. Then, the disentangled text descriptions are utilized as supervisory signals to disentangle pure perceptual representations from CLIP's original `coarse' feature space, dubbed DeCLIP. Finally, the decoupled feature representations are used for both image quality assessment (technical quality and aesthetic quality) and conditional image generation. Extensive experiments and comparisons have demonstrated the advantages of the proposed method on the two popular tasks. The dataset, code, and model will be available.

IVJul 1, 2021
Lossless Coding of Point Cloud Geometry using a Deep Generative Model

Dat Thanh Nguyen, Maurice Quach, Giuseppe Valenzise et al.

This paper proposes a lossless point cloud (PC) geometry compression method that uses neural networks to estimate the probability distribution of voxel occupancy. First, to take into account the PC sparsity, our method adaptively partitions a point cloud into multiple voxel block sizes. This partitioning is signalled via an octree. Second, we employ a deep auto-regressive generative model to estimate the occupancy probability of each voxel given the previously encoded ones. We then employ the estimated probabilities to code efficiently a block using a context-based arithmetic coder. Our context has variable size and can expand beyond the current block to learn more accurate probabilities. We also consider using data augmentation techniques to increase the generalization capability of the learned probability models, in particular in the presence of noise and lower-density point clouds. Experimental evaluation, performed on a variety of point clouds from four different datasets and with diverse characteristics, demonstrates that our method reduces significantly (by up to 30%) the rate for lossless coding compared to the state-of-the-art MPEG codec.

CVDec 1, 2020
Ultra-low bitrate video conferencing using deep image animation

Goluck Konuko, Giuseppe Valenzise, Stéphane Lathuilière

In this work we propose a novel deep learning approach for ultra-low bitrate video compression for video conferencing applications. To address the shortcomings of current video compression paradigms when the available bandwidth is extremely limited, we adopt a model-based approach that employs deep neural networks to encode motion information as keypoint displacement and reconstruct the video signal at the decoder side. The overall system is trained in an end-to-end fashion minimizing a reconstruction error on the encoder output. Objective and subjective quality evaluation experiments demonstrate that the proposed approach provides an average bitrate reduction for the same visual quality of more than 80% compared to HEVC.

IVFeb 11, 2020
Folding-based compression of point cloud attributes

Maurice Quach, Giuseppe Valenzise, Frederic Dufaux

Existing techniques to compress point cloud attributes leverage either geometric or video-based compression tools. We explore a radically different approach inspired by recent advances in point cloud representation learning. Point clouds can be interpreted as 2D manifolds in 3D space. Specifically, we fold a 2D grid onto a point cloud and we map attributes from the point cloud onto the folded 2D grid using a novel optimized mapping method. This mapping results in an image, which opens a way to apply existing image processing techniques on point cloud attributes. However, as this mapping process is lossy in nature, we propose several strategies to refine it so that attributes can be mapped to the 2D grid with minimal distortion. Moreover, this approach can be flexibly applied to point cloud patches in order to better adapt to local geometric complexity. In this work, we consider point cloud attribute compression; thus, we compress this image with a conventional 2D image codec. Our preliminary results show that the proposed folding-based coding scheme can already reach performance similar to the latest MPEG Geometry-based PCC (G-PCC) codec.

IVNov 27, 2019
Potential of deep features for opinion-unaware, distortion-unaware, no-reference image quality assessment

Subhayan Mukherjee, Giuseppe Valenzise, Irene Cheng

Image Quality Assessment algorithms predict a quality score for a pristine or distorted input image, such that it correlates with human opinion. Traditional methods required a non-distorted "reference" version of the input image to compare with, in order to predict this score. However, recent "No-reference" methods circumvent this requirement by modelling the distribution of clean image features, thereby making them more suitable for practical use. However, majority of such methods either use hand-crafted features or require training on human opinion scores (supervised learning), which are difficult to obtain and standardise. We explore the possibility of using deep features instead, particularly, the encoded (bottleneck) feature maps of a Convolutional Autoencoder neural network architecture. Also, we do not train the network on subjective scores (unsupervised learning). The primary requirements for an IQA method are monotonic increase in predicted scores with increasing degree of input image distortion, and consistent ranking of images with the same distortion type and content, but different distortion levels. Quantitative experiments using the Pearson, Kendall and Spearman correlation scores on a diverse set of images show that our proposed method meets the above requirements better than the state-of-art method (which uses hand-crafted features) for three types of distortions: blurring, noise and compression artefacts. This demonstrates the potential for future research in this relatively unexplored sub-area within IQA.

IVAug 12, 2019
Deep Tone Mapping Operator for High Dynamic Range Images

Aakanksha Rana, Praveer Singh, Giuseppe Valenzise et al.

A computationally fast tone mapping operator (TMO) that can quickly adapt to a wide spectrum of high dynamic range (HDR) content is quintessential for visualization on varied low dynamic range (LDR) output devices such as movie screens or standard displays. Existing TMOs can successfully tone-map only a limited number of HDR content and require an extensive parameter tuning to yield the best subjective-quality tone-mapped output. In this paper, we address this problem by proposing a fast, parameter-free and scene-adaptable deep tone mapping operator (DeepTMO) that yields a high-resolution and high-subjective quality tone mapped output. Based on conditional generative adversarial network (cGAN), DeepTMO not only learns to adapt to vast scenic-content (e.g., outdoor, indoor, human, structures, etc.) but also tackles the HDR related scene-specific challenges such as contrast and brightness, while preserving the fine-grained details. We explore 4 possible combinations of Generator-Discriminator architectural designs to specifically address some prominent issues in HDR related deep-learning frameworks like blurring, tiling patterns and saturation artifacts. By exploring different influences of scales, loss-functions and normalization layers under a cGAN setting, we conclude with adopting a multi-scale model for our task. To further leverage on the large-scale availability of unlabeled HDR data, we train our network by generating targets using an objective HDR quality metric, namely Tone Mapping Image Quality Index (TMQI). We demonstrate results both quantitatively and qualitatively, and showcase that our DeepTMO generates high-resolution, high-quality output images over a large spectrum of real-world scenes. Finally, we evaluate the perceived quality of our results by conducting a pair-wise subjective study which confirms the versatility of our method.

MMMar 11, 2018
Learning Local Distortion Visibility From Image Quality Data-sets

Navaneeth Kamballur Kottayil, Giuseppe Valenzise, Frederic Dufaux et al.

Accurate prediction of local distortion visibility thresholds is critical in many image and video processing applications. Existing methods require an accurate modeling of the human visual system, and are derived through pshycophysical experiments with simple, artificial stimuli. These approaches, however, are difficult to generalize to natural images with complex types of distortion. In this paper, we explore a different perspective, and we investigate whether it is possible to learn local distortion visibility from image quality scores. We propose a convolutional neural network based optimization framework to infer local detection thresholds in a distorted image. Our model is trained on multiple quality datasets, and the results are correlated with empirical visibility thresholds collected on complex stimuli in a recent study. Our results are comparable to state-of-the-art mathematical models that were trained on phsycovisual data directly. This suggests that it is possible to predict psychophysical phenomena from visibility information embedded in image quality scores.

MMDec 20, 2017
Blind High Dynamic Range Quality estimation by disentangling perceptual and noise features in images

Navaneeth Kamballur Kottayil, Giuseppe Valenzise, Frederic Dufaux et al.

Assessing the visual quality of High Dynamic Range (HDR) images is an unexplored and an interesting research topic that has become relevant with the current boom in HDR technology. We propose a new convolutional neural network based model for No reference image quality assessment(NR-IQA) on HDR data. This model predicts the amount and location of noise, perceptual influence of image pixels on the noise, and the perceived quality, of a distorted image without any reference image. The proposed model extracts numerical values corresponding to the noise present in any given distorted image, and the perceptual effects exhibited by a human eye when presented with the same. These two measures are extracted separately yet sequentially and combined in a mixing function to compute the quality of the distorted image perceived by a human eye. Our training process derives the the component that computes perceptual effects from a real world image quality dataset, rather than using results of psycovisual experiments. With the proposed model, we demonstrate state of the art performance for HDR NR-IQA and our results show performance similar to HDR Full Reference Image Quality Assessment algorithms (FR-IQA).