Xuemin Hong

LGDec 22, 2023

The Rate-Distortion-Perception-Classification Tradeoff: Joint Source Coding and Modulation via Inverse-Domain GANs

Junli Fang, João F. C. Mota, Baoshan Lu et al.

The joint source-channel coding (JSCC) framework leverages deep learning to learn from data the best codes for source and channel coding. When the output signal, rather than being binary, is directly mapped onto the IQ domain (complex-valued), we call the resulting framework joint source coding and modulation (JSCM). We consider a JSCM scenario and show the existence of a strict tradeoff between channel rate, distortion, perception, and classification accuracy, a tradeoff that we name RDPC. We then propose two image compression methods to navigate that tradeoff: the RDPCO algorithm which, under simple assumptions, directly solves the optimization problem characterizing the tradeoff, and an algorithm based on an inverse-domain generative adversarial network (ID-GAN), which is more general and achieves extreme compression. Simulation results corroborate the theoretical findings, showing that both algorithms exhibit the RDPC tradeoff. They also demonstrate that the proposed ID-GAN algorithm effectively balances image distortion, perception, and classification accuracy, and significantly outperforms traditional separation-based methods and recent deep JSCM architectures in terms of one or more of these metrics.

CVJul 23, 2025

IONext: Unlocking the Next Era of Inertial Odometry

Shanshan Zhang, Qi Zhang, Siyue Wang et al.

Researchers have increasingly adopted Transformer-based models for inertial odometry. While Transformers excel at modeling long-range dependencies, their limited sensitivity to local, fine-grained motion variations and lack of inherent inductive biases often hinder localization accuracy and generalization. Recent studies have shown that incorporating large-kernel convolutions and Transformer-inspired architectural designs into CNN can effectively expand the receptive field, thereby improving global motion perception. Motivated by these insights, we propose a novel CNN-based module called the Dual-wing Adaptive Dynamic Mixer (DADM), which adaptively captures both global motion patterns and local, fine-grained motion features from dynamic inputs. This module dynamically generates selective weights based on the input, enabling efficient multi-scale feature aggregation. To further improve temporal modeling, we introduce the Spatio-Temporal Gating Unit (STGU), which selectively extracts representative and task-relevant motion features in the temporal domain. This unit addresses the limitations of temporal modeling observed in existing CNN approaches. Built upon DADM and STGU, we present a new CNN-based inertial odometry backbone, named Next Era of Inertial Odometry (IONext). Extensive experiments on six public datasets demonstrate that IONext consistently outperforms state-of-the-art (SOTA) Transformer- and CNN-based methods. For instance, on the RNIN dataset, IONext reduces the average ATE by 10% and the average RTE by 12% compared to the representative model iMOT.

Xuemin Hong

2 Papers