96.1ITApr 13
Channel-Aware Preemptive Scheduling for Semantic Communication with Truncated Diffusion and Path CompensationChengyang Liang, Dong Li
Semantic communication (SemCom) presents a transformative paradigm for alleviating bandwidth limitations in mobile networks by transmitting task-relevant semantic features instead of raw data bits. While SemCom systems utilizing diffusion models achieve superior generation quality, existing research treats semantic generation and wireless transmission as temporally independent processes. This separation neglects the intrinsic conflict between the multi-step iterative delays inherent in diffusion models and the time-varying fading characteristics of wireless channels. To address this discrepancy, this paper proposes a channel-aware preemptive scheduling with truncated diffusion and path compensation (CAPS-TDPC) framework. Contrary to conventional methods that require completion of the generation phase prior to transmission, the proposed framework implements a channel-driven scheduling mechanism: each user maintains a countdown inversely proportional to its instantaneous channel gain, and the user with the shortest countdown transmits immediately, regardless of whether its diffusion process has completed. This design permits the interruption of the forward diffusion process to enable early transmission under favorable channel conditions. In addition, a receiver-side compensation mechanism grounded in path dynamics is developed to mitigate the semantic loss resulting from such interruptions. A path deficit metric is proposed at the receiver to quantify the recovery difficulty of distinct image blocks by incorporating the velocity field of the inverse dynamics model, which allows for adaptive weighted inverse sampling. Experimental evaluations demonstrate that the proposed framework substantially reduces the end-to-end latency while maintaining the high-fidelity semantic reconstruction, thereby enhancing the system robustness in fast fading channel environments.
96.9ITApr 5
CTD-Diff: Cooperative Time-Division Diffusion for Multi-User Semantic Communication SystemsChengyang Liang, Dong Li
Semantic communication (SemCom) has emerged as a transformative paradigm for efficient information transmission by emphasizing the exchange of task-relevant meaning rather than raw data. While diffusion-based SemCom models have demonstrated remarkable generative capabilities, existing studies predominantly focus on point-to-point links, overlooking the potential of multi-user (MU) cooperation in MU wireless environments. To address this limitation, we propose a Cooperative Time-Division Diffusion (CTD-Diff) framework. Unlike traditional approaches that view channel noise solely as a detriment, our framework innovatively integrates the noisy wireless transmission process directly into the forward diffusion chain. Specifically, we design a multi-user cooperation mechanism based on Time-Division Multiple Access (TDMA), where idle users overhearing the active transmitter act as semantic collaborators. To maximize the signal fidelity, the receiver employs direct signal aggregation to fuse the direct signal with cooperative copies. This aggregated noisy semantic representation serves as the condition for the reverse diffusion process, allowing the receiver to reconstruct high-fidelity data by mitigating the cumulative channel distortions. By effectively converting physical channel noise into diffusion noise, the proposed method significantly enhances the transmission reliability. Extensive experiments demonstrate that CTD-Diff outperforms various baselines regarding the reconstruction accuracy and the perceptual quality, particularly under challenging low signal-to-noise ratio (SNR) conditions.
CVDec 16, 2020
Domain Adaptive Object Detection via Feature Separation and AlignmentChengyang Liang, Zixiang Zhao, Junmin Liu et al.
Recently, adversarial-based domain adaptive object detection (DAOD) methods have been developed rapidly. However, there are two issues that need to be resolved urgently. Firstly, numerous methods reduce the distributional shifts only by aligning all the feature between the source and target domain, while ignoring the private information of each domain. Secondly, DAOD should consider the feature alignment on object existing regions in images. But redundancy of the region proposals and background noise could reduce the domain transferability. Therefore, we establish a Feature Separation and Alignment Network (FSANet) which consists of a gray-scale feature separation (GSFS) module, a local-global feature alignment (LGFA) module and a region-instance-level alignment (RILA) module. The GSFS module decomposes the distractive/shared information which is useless/useful for detection by a dual-stream framework, to focus on intrinsic feature of objects and resolve the first issue. Then, LGFA and RILA modules reduce the distributional shifts of the multi-level features. Notably, scale-space filtering is exploited to implement adaptive searching for regions to be aligned, and instance-level features in each region are refined to reduce redundancy and noise mentioned in the second issue. Various experiments on multiple benchmark datasets prove that our FSANet achieves better performance on the target domain detection and surpasses the state-of-the-art methods.
CVMay 12, 2020
Efficient and Model-Based Infrared and Visible Image Fusion Via Algorithm UnrollingZixiang Zhao, Shuang Xu, Jiangshe Zhang et al.
Infrared and visible image fusion (IVIF) expects to obtain images that retain thermal radiation information from infrared images and texture details from visible images. In this paper, a model-based convolutional neural network (CNN) model, referred to as Algorithm Unrolling Image Fusion (AUIF), is proposed to overcome the shortcomings of traditional CNN-based IVIF models. The proposed AUIF model starts with the iterative formulas of two traditional optimization models, which are established to accomplish two-scale decomposition, i.e., separating low-frequency base information and high-frequency detail information from source images. Then the algorithm unrolling is implemented where each iteration is mapped to a CNN layer and each optimization model is transformed into a trainable neural network. Compared with the general network architectures, the proposed framework combines the model-based prior information and is designed more reasonably. After the unrolling operation, our model contains two decomposers (encoders) and an additional reconstructor (decoder). In the training phase, this network is trained to reconstruct the input image. While in the test phase, the base (or detail) decomposed feature maps of infrared/visible images are merged respectively by an extra fusion layer, and then the decoder outputs the fusion image. Qualitative and quantitative comparisons demonstrate the superiority of our model, which can robustly generate fusion images containing highlight targets and legible details, exceeding the state-of-the-art methods. Furthermore, our network has fewer weights and faster speed.