Jinke Ren

IT
h-index116
27papers
715citations
Novelty52%
AI Score57

27 Papers

27.7SPApr 26
Hierarchical Learning for IRS-Assisted MEC Systems with Rate-Splitting Multiple Access

Yinyu Wu, Xuhui Zhang, Yingchao Jiao et al.

Intelligent reflecting surface (IRS)-assisted mobile edge computing (MEC) systems have shown notable improvements in efficiency, such as reduced latency, higher data rates, and better energy efficiency. However, the resource competition among users will lead to uneven allocation, increased latency, and lower throughput. Fortunately, the rate-splitting multiple access (RSMA) technique has emerged as a promising solution for managing interference and optimizing resource allocation in MEC systems. This paper studies an IRS-assisted MEC system with RSMA, aiming to jointly optimize the passive beamforming of the IRS, the active beamforming of the base station, the task offloading allocation, the transmit power of users, the ratios of public and private information allocation, and the decoding order of the RSMA to minimize the average delay from a novel uplink transmission perspective. Since the formulated problem is non-convex and the optimization variables are highly coupled, we propose a hierarchical deep reinforcement learning-based algorithm to optimize both continuous and discrete variables of the problem. Additionally, to better extract channel features, we design a novel network architecture within the policy and evaluation networks of the proposed algorithm, combining convolutional neural networks and densely connected convolutional network for feature extraction. Simulation results indicate that the proposed algorithm not only exhibits excellent convergence performance but also outperforms various benchmarks.

CVNov 26, 2023
Visual Programming for Zero-shot Open-Vocabulary 3D Visual Grounding

Zhihao Yuan, Jinke Ren, Chun-Mei Feng et al.

3D Visual Grounding (3DVG) aims at localizing 3D object based on textual descriptions. Conventional supervised methods for 3DVG often necessitate extensive annotations and a predefined vocabulary, which can be restrictive. To address this issue, we propose a novel visual programming approach for zero-shot open-vocabulary 3DVG, leveraging the capabilities of large language models (LLMs). Our approach begins with a unique dialog-based method, engaging with LLMs to establish a foundational understanding of zero-shot 3DVG. Building on this, we design a visual program that consists of three types of modules, i.e., view-independent, view-dependent, and functional modules. These modules, specifically tailored for 3D scenarios, work collaboratively to perform complex reasoning and inference. Furthermore, we develop an innovative language-object correlation module to extend the scope of existing 3D object detectors into open-vocabulary scenarios. Extensive experiments demonstrate that our zero-shot approach can outperform some supervised baselines, marking a significant stride towards effective 3DVG.

98.1CVMar 12Code
SVLL: Staged Vision-Language Learning for Physically Grounded Embodied Task Planning

Yuyuan Yang, Junkun Hong, Hongrong Wang et al.

Embodied task planning demands vision-language models to generate action sequences that are both visually grounded and causally coherent over time. However, existing training paradigms face a critical trade-off: joint end-to-end training often leads to premature temporal binding, while standard reinforcement learning methods suffer from optimization instability. To bridge this gap, we present Staged Vision-Language Learning (SVLL), a unified three-stage framework for robust, physically-grounded embodied planning. In the first two stages, SVLL decouples spatial grounding from temporal reasoning, establishing robust visual dependency before introducing sequential action history. In the final stage, we identify a key limitation of standard Direct Preference Optimization (DPO), its purely relative nature -- optimizing only the preference gap between winning and losing trajectories while neglecting absolute likelihood constraints on optimal path, often yields unsafe or hallucinated behaviors. To address this, we further introduce Bias-DPO, a novel alignment objective that injects an inductive bias toward expert trajectories by explicitly maximizing likelihood on ground-truth actions while penalizing overconfident hallucinations. By anchoring the policy to the expert manifold and mitigating causal misalignment, SVLL, powered by Bias-DPO, ensures strict adherence to environmental affordances and effectively suppresses physically impossible shortcuts. Finally, extensive experiments on the interactive AI2-THOR benchmark and real-world robotic deployments demonstrate that SVLL outperforms both state-of-the-art open-source (e.g., Qwen2.5-VL-7B) and closed-source models (e.g., GPT-4o, Gemini-2.0-flash) in task success rate, while significantly reducing physical constraint violations.

ITNov 21, 2023
Knowledge Base Enabled Semantic Communication: A Generative Perspective

Jinke Ren, Zezhong Zhang, Jie Xu et al.

Semantic communication is widely touted as a key technology for propelling the sixth-generation (6G) wireless networks. However, providing effective semantic representation is quite challenging in practice. To address this issue, this article takes a crack at exploiting semantic knowledge base (KB) to usher in a new era of generative semantic communication. Via semantic KB, source messages can be characterized in low-dimensional subspaces without compromising their desired meanings, thus significantly enhancing the communication efficiency. The fundamental principle of semantic KB is first introduced, and a generative semantic communication architecture is developed by presenting three sub-KBs, namely source, task, and channel KBs. Then, the detailed construction approaches for each sub-KB are described, followed by their utilization in terms of semantic coding and transmission. A case study is also provided to showcase the superiority of generative semantic communication over conventional syntactic communication and classical semantic communication. In a nutshell, this article establishes a scientific foundation for the exciting uncharted frontier of generative semantic communication.

15.2ITMar 19
UAV-Enabled ISAC with Fluid Antennas for Low-Altitude Wireless Networks

Wenchao Liu, Xuhui Zhang, Jinke Ren et al.

Unmanned aerial vehicle (UAV)-enabled integrated sensing and communication (ISAC) is regarded as a key enabler for next-generation wireless systems. However, conventional fixed-position antennas limit the ability of UAVs to fully exploit their inherent potential. To overcome this limitation, we propose a UAV-enabled ISAC framework equipped with fluid antennas (FAs), where the mobility of antenna elements introduces additional spatial degrees of freedom to simultaneously enhance communication and sensing performance. A multi-objective optimization problem is formulated to maximize the communication rates of multiple users while minimizing the Cramér-Rao bound (CRB) for the angle estimation of a single target. Due to excessively frequent updates of FA positions may lead to response delay, a three-timescale optimization framework is developed to jointly optimize transmit beamforming, FA positions, and UAV trajectory based on their characteristics. To solve the non-convexity of the problem, an alternating optimization-based algorithm is developed to obtain a sub-optimal solution. Numerical results show that the proposed scheme significantly outperforms various benchmark schemes, validating the effectiveness of integrating the FA technology into the UAV-enabled ISAC systems.

SYAug 4, 2024
Latency-Aware Resource Allocation for Mobile Edge Generation and Computing via Deep Reinforcement Learning

Yinyu Wu, Xuhui Zhang, Jinke Ren et al.

Recently, the integration of mobile edge computing (MEC) and generative artificial intelligence (GAI) technology has given rise to a new area called mobile edge generation and computing (MEGC), which offers mobile users heterogeneous services such as task computing and content generation. In this letter, we investigate the joint communication, computation, and the AIGC resource allocation problem in an MEGC system. A latency minimization problem is first formulated to enhance the quality of service for mobile users. Due to the strong coupling of the optimization variables, we propose a new deep reinforcement learning-based algorithm to solve it efficiently. Numerical results demonstrate that the proposed algorithm can achieve lower latency than two baseline algorithms.

SPApr 29, 2024Code
Joint Signal Detection and Automatic Modulation Classification via Deep Learning

Huijun Xing, Xuhui Zhang, Shuo Chang et al.

Signal detection and modulation classification are two crucial tasks in various wireless communication systems. Different from prior works that investigate them independently, this paper studies the joint signal detection and automatic modulation classification (AMC) by considering a realistic and complex scenario, in which multiple signals with different modulation schemes coexist at different carrier frequencies. We first generate a coexisting RADIOML dataset (CRML23) to facilitate the joint design. Different from the publicly available AMC dataset ignoring the signal detection step and containing only one signal, our synthetic dataset covers the more realistic multiple-signal coexisting scenario. Then, we present a joint framework for detection and classification (JDM) for such a multiple-signal coexisting environment, which consists of two modules for signal detection and AMC, respectively. In particular, these two modules are interconnected using a designated data structure called "proposal". Finally, we conduct extensive simulations over the newly developed dataset, which demonstrate the effectiveness of our designs. Our code and dataset are now available as open-source (https://github.com/Singingkettle/ChangShuoRadioData).

ROMar 2, 2025Code
CLEA: Closed-Loop Embodied Agent for Enhancing Task Execution in Dynamic Environments

Mingcong Lei, Ge Wang, Yiming Zhao et al.

Large Language Models (LLMs) exhibit remarkable capabilities in the hierarchical decomposition of complex tasks through semantic reasoning. However, their application in embodied systems faces challenges in ensuring reliable execution of subtask sequences and achieving one-shot success in long-term task completion. To address these limitations in dynamic environments, we propose Closed-Loop Embodied Agent (CLEA) -- a novel architecture incorporating four specialized open-source LLMs with functional decoupling for closed-loop task management. The framework features two core innovations: (1) Interactive task planner that dynamically generates executable subtasks based on the environmental memory, and (2) Multimodal execution critic employing an evaluation framework to conduct a probabilistic assessment of action feasibility, triggering hierarchical re-planning mechanisms when environmental perturbations exceed preset thresholds. To validate CLEA's effectiveness, we conduct experiments in a real environment with manipulable objects, using two heterogeneous robots for object search, manipulation, and search-manipulation integration tasks. Across 12 task trials, CLEA outperforms the baseline model, achieving a 67.3% improvement in success rate and a 52.8% increase in task completion rate. These results demonstrate that CLEA significantly enhances the robustness of task planning and execution in dynamic environments.

52.0ITMay 11
Survey-Free Radio Map Construction via HMM-Based Coarse-to-Fine Inference

Zheng Xing, Weibing Zhao, Guanghui Zhang et al.

Traditional radio map construction methods mandate labor-intensive data collection and precise location labeling. To address these limitations, we propose a novel survey-free approach for radio map construction that relies solely on unlabeled Received Signal Strength (RSS) measurements, thereby obviating the need for manual site surveys or auxiliary Inertial Measurement Units (IMUs). The key idea involves embedding multiple unlabeled RSS sequences into a known indoor layout, specifically targeting corridor-guided environments with a dominant unidirectional pedestrian flow. However, aligning the embedded coordinates with the RSS collection locations remains challenging due to the random fluctuations inherent in RSS data. To tackle this, we introduce a Hidden Markov Model (HMM)- based Coarse-to-Fine Inference (HCFI) framework. At the coarse level, we employ an HMM-based region label inference algorithm to partition RSS sequences and align the RSS segments with specific physical regions using graph-based inference. At the fine level, we develop an HMM-based location label inference technique to estimate RSS collection coordinates by leveraging RSS propagation principles while incorporating sequential spatio-temporal mobility probability. Empirical results from an office environment demonstrate that the proposed method achieves a radio map construction Mean Absolute Error (MAE) of 8.96 dB. Furthermore, based on the estimated radio map, k-Nearest Neighbor (KNN) localization yields an average positioning error of approximately 3.33 meters, offering a highly viable, survey-free solution for radio map construction under sequential topological assumptions.

50.8ITMay 11
Annotation-Free Indoor Radio Mapping via Physics-Informed Trajectory Inference

Zheng Xing, Mengru Wu, Yi Zhang et al.

Constructing indoor radio maps traditionally requires extensive site surveys with precise user-location labels, making the calibration process costly and time-consuming. Existing calibration-reduction methods either depend on partial location annotations or exploit inertial measurement units (IMUs) to provide relative motion cues; however, IMU-assisted solutions are constrained by hardware availability, device-level access restrictions, and accumulated sensor drift. In this paper, we study a location-label-free indoor radio mapping problem under known access-point deployment geometry and a known walkable spatial domain. We propose a physics-informed trajectory inference framework that uses only Channel State Information (CSI), without relying on user-location labels or IMU measurements. The key idea is to recover the latent spatial coordinates of CSI measurements by exploiting the local spatial continuity of multipath propagation. To this end, we construct a Power-Angle-Delay Profile (PADP) feature distance from MIMO-OFDM CSI and show that, within a local neighborhood and under quasi-static multipath conditions, this distance provides a physically meaningful proxy for small spatial displacements. We then incorporate the PADP-based continuity constraint into a spatially regularized Bayesian inference model for joint trajectory recovery and propagation-parameter estimation. Experiments on a real-world industrial CSI dataset demonstrate that the proposed framework achieves an average localization error of 0.88 m and a relative beam map construction error of 6.68%, improving upon representative channel-embedding and IMU-assisted baselines.

LGSep 28, 2025Code
Decentralized Dynamic Cooperation of Personalized Models for Federated Continual Learning

Danni Yang, Zhikang Chen, Sen Cui et al. · pku

Federated continual learning (FCL) has garnered increasing attention for its ability to support distributed computation in environments with evolving data distributions. However, the emergence of new tasks introduces both temporal and cross-client shifts, making catastrophic forgetting a critical challenge. Most existing works aggregate knowledge from clients into a global model, which may not enhance client performance since irrelevant knowledge could introduce interference, especially in heterogeneous scenarios. Additionally, directly applying decentralized approaches to FCL suffers from ineffective group formation caused by task changes. To address these challenges, we propose a decentralized dynamic cooperation framework for FCL, where clients establish dynamic cooperative learning coalitions to balance the acquisition of new knowledge and the retention of prior learning, thereby obtaining personalized models. To maximize model performance, each client engages in selective cooperation, dynamically allying with others who offer meaningful performance gains. This results in non-overlapping, variable coalitions at each stage of the task. Moreover, we use coalitional affinity game to simulate coalition relationships between clients. By assessing both client gradient coherence and model similarity, we quantify the client benefits derived from cooperation. We also propose a merge-blocking algorithm and a dynamic cooperative evolution algorithm to achieve cooperative and dynamic equilibrium. Comprehensive experiments demonstrate the superiority of our method compared to various baselines. Code is available at: https://github.com/ydn3229/DCFCL.

SPDec 10, 2024
Latency Minimization for UAV-Enabled Federated Learning: Trajectory Design and Resource Allocation

Xuhui Zhang, Wenchao Liu, Jinke Ren et al.

Federated learning (FL) has become a transformative paradigm for distributed machine learning across wireless networks. However, the performance of FL is often hindered by the unreliable communication links between resource-constrained Internet of Things (IoT) devices and the central server. To overcome this challenge, we propose a novel framework that employs an unmanned aerial vehicle (UAV) as a mobile server to enhance the FL training process. By capitalizing on the UAV's mobility, we establish strong line-of-sight connections with IoT devices, thereby enhancing communication reliability and capacity. To maximize training efficiency, we formulate a latency minimization problem that jointly optimizes bandwidth allocation, computing frequencies, transmit power for both the UAV and IoT devices, and the UAV's flight trajectory. Subsequently, we analyze the required rounds of the IoT devices training and the UAV aggregation for FL convergence. Based on the convergence constraint, we transform the problem into three subproblems and develop an efficient alternating optimization algorithm to solve this problem effectively. Additionally, we provide a thorough analysis of the algorithm's convergence and computational complexity. Extensive numerical results demonstrate that our proposed scheme not only surpasses existing benchmark schemes in reducing latency up to 15.29%, but also achieves training efficiency that nearly matches the ideal scenario.

LGJan 29, 2024
Scalable Federated Unlearning via Isolated and Coded Sharding

Yijing Lin, Zhipeng Gao, Hongyang Du et al.

Federated unlearning has emerged as a promising paradigm to erase the client-level data effect without affecting the performance of collaborative learning models. However, the federated unlearning process often introduces extensive storage overhead and consumes substantial computational resources, thus hindering its implementation in practice. To address this issue, this paper proposes a scalable federated unlearning framework based on isolated sharding and coded computing. We first divide distributed clients into multiple isolated shards across stages to reduce the number of clients being affected. Then, to reduce the storage overhead of the central server, we develop a coded computing mechanism by compressing the model parameters across different shards. In addition, we provide the theoretical analysis of time efficiency and storage effectiveness for the isolated and coded sharding. Finally, extensive experiments on two typical learning tasks, i.e., classification and generation, demonstrate that our proposed framework can achieve better performance than three state-of-the-art frameworks in terms of accuracy, retraining time, storage overhead, and F1 scores for resisting membership inference attacks.

LGJan 29, 2024
Blockchain-enabled Trustworthy Federated Unlearning

Yijing Lin, Zhipeng Gao, Hongyang Du et al.

Federated unlearning is a promising paradigm for protecting the data ownership of distributed clients. It allows central servers to remove historical data effects within the machine learning model as well as address the "right to be forgotten" issue in federated learning. However, existing works require central servers to retain the historical model parameters from distributed clients, such that allows the central server to utilize these parameters for further training even, after the clients exit the training process. To address this issue, this paper proposes a new blockchain-enabled trustworthy federated unlearning framework. We first design a proof of federated unlearning protocol, which utilizes the Chameleon hash function to verify data removal and eliminate the data contributions stored in other clients' models. Then, an adaptive contribution-based retraining mechanism is developed to reduce the computational overhead and significantly improve the training efficiency. Extensive experiments demonstrate that the proposed framework can achieve a better data removal effect than the state-of-the-art frameworks, marking a significant stride towards trustworthy federated unlearning.

ITNov 27, 2024
Generative Semantic Communication for Joint Image Transmission and Segmentation

Weiwen Yuan, Jinke Ren, Chongjie Wang et al.

Semantic communication has emerged as a promising technology for enhancing communication efficiency. However, most existing research emphasizes single-task reconstruction, neglecting model adaptability and generalization across multi-task systems. In this paper, we propose a novel generative semantic communication system that supports both image reconstruction and segmentation tasks. Our approach builds upon semantic knowledge bases (KBs) at both the transmitter and receiver, with each semantic KB comprising a source KB and a task KB. The source KB at the transmitter leverages a hierarchical Swin-Transformer, a generative AI scheme, to extract multi-level features from the input image. Concurrently, the counterpart source KB at the receiver utilizes hierarchical residual blocks to generate task-specific knowledge. Furthermore, the task KBs adopt a semantic similarity model to map different task requirements into pre-defined task instructions, thereby facilitating the feature selection of the source KBs. Additionally, we develop a unified residual block-based joint source and channel (JSCC) encoder and two task-specific JSCC decoders to achieve the two image tasks. In particular, a generative diffusion model is adopted to construct the JSCC decoder for the image reconstruction task. Experimental results show that our multi-task generative semantic communication system outperforms previous single-task communication systems in terms of peak signal-to-noise ratio and segmentation accuracy.

ITDec 11, 2024
Generative Semantic Communication: Architectures, Technologies, and Applications

Jinke Ren, Yaping Sun, Hongyang Du et al.

This paper delves into the applications of generative artificial intelligence (GAI) in semantic communication (SemCom) and presents a thorough study. Three popular SemCom systems enabled by classical GAI models are first introduced, including variational autoencoders, generative adversarial networks, and diffusion models. For each system, the fundamental concept of the GAI model, the corresponding SemCom architecture, and the associated literature review of recent efforts are elucidated. Then, a novel generative SemCom system is proposed by incorporating the cutting-edge GAI technology-large language models (LLMs). This system features two LLM-based AI agents at both the transmitter and receiver, serving as "brains" to enable powerful information understanding and content regeneration capabilities, respectively. This innovative design allows the receiver to directly generate the desired content, instead of recovering the bit stream, based on the coded semantic information conveyed by the transmitter. Therefore, it shifts the communication mindset from "information recovery" to "information regeneration" and thus ushers in a new era of generative SemCom. A case study on point-to-point video retrieval is presented to demonstrate the superiority of the proposed generative SemCom system, showcasing a 99.98% reduction in communication overhead and a 53% improvement in retrieval accuracy compared to the traditional communication system. Furthermore, four typical application scenarios for generative SemCom are delineated, followed by a discussion of three open issues warranting future investigation. In a nutshell, this paper provides a holistic set of guidelines for applying GAI in SemCom, paving the way for the efficient implementation of generative SemCom in future wireless networks.

IVNov 25, 2024
Privacy-Preserving Federated Foundation Model for Generalist Ultrasound Artificial Intelligence

Yuncheng Jiang, Chun-Mei Feng, Jinke Ren et al.

Ultrasound imaging is widely used in clinical diagnosis due to its non-invasive nature and real-time capabilities. However, conventional ultrasound diagnostics face several limitations, including high dependence on physician expertise and suboptimal image quality, which complicates interpretation and increases the likelihood of diagnostic errors. Artificial intelligence (AI) has emerged as a promising solution to enhance clinical diagnosis, particularly in detecting abnormalities across various biomedical imaging modalities. Nonetheless, current AI models for ultrasound imaging face critical challenges. First, these models often require large volumes of labeled medical data, raising concerns over patient privacy breaches. Second, most existing models are task-specific, which restricts their broader clinical utility. To overcome these challenges, we present UltraFedFM, an innovative privacy-preserving ultrasound foundation model. UltraFedFM is collaboratively pre-trained using federated learning across 16 distributed medical institutions in 9 countries, leveraging a dataset of over 1 million ultrasound images covering 19 organs and 10 ultrasound modalities. This extensive and diverse data, combined with a secure training framework, enables UltraFedFM to exhibit strong generalization and diagnostic capabilities. It achieves an average area under the receiver operating characteristic curve of 0.927 for disease diagnosis and a dice similarity coefficient of 0.878 for lesion segmentation. Notably, UltraFedFM surpasses the diagnostic accuracy of mid-level ultrasonographers and matches the performance of expert-level sonographers in the joint diagnosis of 8 common systemic diseases. These findings indicate that UltraFedFM can significantly enhance clinical diagnostics while safeguarding patient privacy, marking an advancement in AI-driven ultrasound imaging for future clinical applications.

AIFeb 14, 2025
STMA: A Spatio-Temporal Memory Agent for Long-Horizon Embodied Task Planning

Mingcong Lei, Yiming Zhao, Ge Wang et al.

A key objective of embodied intelligence is enabling agents to perform long-horizon tasks in dynamic environments while maintaining robust decision-making and adaptability. To achieve this goal, we propose the Spatio-Temporal Memory Agent (STMA), a novel framework designed to enhance task planning and execution by integrating spatio-temporal memory. STMA is built upon three critical components: (1) a spatio-temporal memory module that captures historical and environmental changes in real time, (2) a dynamic knowledge graph that facilitates adaptive spatial reasoning, and (3) a planner-critic mechanism that iteratively refines task strategies. We evaluate STMA in the TextWorld environment on 32 tasks, involving multi-step planning and exploration under varying levels of complexity. Experimental results demonstrate that STMA achieves a 31.25% improvement in success rate and a 24.7% increase in average score compared to the state-of-the-art model. The results highlight the effectiveness of spatio-temporal memory in advancing the memory capabilities of embodied agents.

CVMar 29, 2025
Empowering Large Language Models with 3D Situation Awareness

Zhihao Yuan, Yibo Peng, Jinke Ren et al.

Driven by the great success of Large Language Models (LLMs) in the 2D image domain, their applications in 3D scene understanding has emerged as a new trend. A key difference between 3D and 2D is that the situation of an egocentric observer in 3D scenes can change, resulting in different descriptions (e.g., ''left" or ''right"). However, current LLM-based methods overlook the egocentric perspective and simply use datasets from a global viewpoint. To address this issue, we propose a novel approach to automatically generate a situation-aware dataset by leveraging the scanning trajectory during data collection and utilizing Vision-Language Models (VLMs) to produce high-quality captions and question-answer pairs. Furthermore, we introduce a situation grounding module to explicitly predict the position and orientation of observer's viewpoint, thereby enabling LLMs to ground situation description in 3D scenes. We evaluate our approach on several benchmarks, demonstrating that our method effectively enhances the 3D situational awareness of LLMs while significantly expanding existing datasets and reducing manual effort.

CVAug 23, 2025
MSPCaps: A Multi-Scale Patchify Capsule Network with Cross-Agreement Routing for Visual Recognition

Yudong Hu, Yueju Han, Rui Sun et al.

Capsule Network (CapsNet) has demonstrated significant potential in visual recognition by capturing spatial relationships and part-whole hierarchies for learning equivariant feature representations. However, existing CapsNet and variants often rely on a single high-level feature map, overlooking the rich complementary information from multi-scale features. Furthermore, conventional feature fusion strategies (e.g., addition and concatenation) struggle to reconcile multi-scale feature discrepancies, leading to suboptimal classification performance. To address these limitations, we propose the Multi-Scale Patchify Capsule Network (MSPCaps), a novel architecture that integrates multi-scale feature learning and efficient capsule routing. Specifically, MSPCaps consists of three key components: a Multi-Scale ResNet Backbone (MSRB), a Patchify Capsule Layer (PatchifyCaps), and Cross-Agreement Routing (CAR) blocks. First, the MSRB extracts diverse multi-scale feature representations from input images, preserving both fine-grained details and global contextual information. Second, the PatchifyCaps partitions these multi-scale features into primary capsules using a uniform patch size, equipping the model with the ability to learn from diverse receptive fields. Finally, the CAR block adaptively routes the multi-scale capsules by identifying cross-scale prediction pairs with maximum agreement. Unlike the simple concatenation of multiple self-routing blocks, CAR ensures that only the most coherent capsules contribute to the final voting. Our proposed MSPCaps achieves remarkable scalability and superior robustness, consistently surpassing multiple baseline methods in terms of classification accuracy, with configurations ranging from a highly efficient Tiny model (344.3K parameters) to a powerful Large model (10.9M parameters), highlighting its potential in advancing feature representation learning.

CLDec 19, 2024
Adaptive Pruning for Large Language Models with Structural Importance Awareness

Haotian Zheng, Jinke Ren, Yushan Sun et al.

The recent advancements in large language models (LLMs) have significantly improved language understanding and generation capabilities. However, it is difficult to deploy LLMs on resource-constrained edge devices due to their high computational and storage resource demands. To address this issue, we propose a novel LLM model pruning method, namely structurally-aware adaptive pruning (SAAP), to significantly reduce the computational and memory costs while maintaining model performance. We first define an adaptive importance fusion metric to evaluate the importance of all coupled structures in LLMs by considering their homoscedastic uncertainty. Then, we rank the importance of all modules to determine the specific layers that should be pruned to meet particular performance requirements. Furthermore, we develop a new group fine-tuning strategy to improve the inference efficiency of LLMs. Finally, we evaluate the proposed SAAP method on multiple LLMs across two common tasks, i.e., zero-shot classification and text generation. Experimental results show that our SAAP method outperforms several state-of-the-art baseline methods, achieving 2.17%, 2.37%, and 2.39% accuracy gains on LLaMA-7B, Vicuna-7B, and LLaMA-13B. Additionally, SAAP improves the token generation speed by 5%, showcasing its practical advantages in resource-constrained scenarios.

CVApr 27, 2024
Instance-free Text to Point Cloud Localization with Relative Position Awareness

Lichao Wang, Zhihao Yuan, Jinke Ren et al.

Text-to-point-cloud cross-modal localization is an emerging vision-language task critical for future robot-human collaboration. It seeks to localize a position from a city-scale point cloud scene based on a few natural language instructions. In this paper, we address two key limitations of existing approaches: 1) their reliance on ground-truth instances as input; and 2) their neglect of the relative positions among potential instances. Our proposed model follows a two-stage pipeline, including a coarse stage for text-cell retrieval and a fine stage for position estimation. In both stages, we introduce an instance query extractor, in which the cells are encoded by a 3D sparse convolution U-Net to generate the multi-scale point cloud features, and a set of queries iteratively attend to these features to represent instances. In the coarse stage, a row-column relative position-aware self-attention (RowColRPA) module is designed to capture the spatial relations among the instance queries. In the fine stage, a multi-modal relative position-aware cross-attention (RPCA) module is developed to fuse the text and point cloud features along with spatial relations for improving fine position estimation. Experiment results on the KITTI360Pose dataset demonstrate that our model achieves competitive performance with the state-of-the-art models without taking ground-truth instances as input.

LGMay 26, 2023
SR-OOD: Out-of-Distribution Detection via Sample Repairing

Rui Sun, Andi Zhang, Haiming Zhang et al.

Out-of-distribution (OOD) detection is a crucial task for ensuring the reliability and robustness of machine learning models. Recent works have shown that generative models often assign high confidence scores to OOD samples, indicating that they fail to capture the semantic information of the data. To tackle this problem, we take advantage of sample repairing and propose a novel OOD detection framework, namely SR-OOD. Our framework leverages the idea that repairing an OOD sample can reveal its semantic inconsistency with the in-distribution data. Specifically, our framework consists of two components: a sample repairing module and a detection module. The sample repairing module applies erosion to an input sample and uses a generative adversarial network to repair it. The detection module then determines whether the input sample is OOD using a distance metric. Our framework does not require any additional data or label information for detection, making it applicable to various scenarios. We conduct extensive experiments on three image datasets: CIFAR-10, CelebA, and Pokemon. The results demonstrate that our approach achieves superior performance over the state-of-the-art generative methods in OOD detection.

LGJul 19, 2021
A New Distributed Method for Training Generative Adversarial Networks

Jinke Ren, Chonghe Liu, Guanding Yu et al.

Generative adversarial networks (GANs) are emerging machine learning models for generating synthesized data similar to real data by jointly training a generator and a discriminator. In many applications, data and computational resources are distributed over many devices, so centralized computation with all data in one location is infeasible due to privacy and/or communication constraints. This paper proposes a new framework for training GANs in a distributed fashion: Each device computes a local discriminator using local data; a single server aggregates their results and computes a global GAN. Specifically, in each iteration, the server sends the global GAN to the devices, which then update their local discriminators; the devices send their results to the server, which then computes their average as the global discriminator and updates the global generator accordingly. Two different update schedules are designed with different levels of parallelism between the devices and the server. Numerical results obtained using three popular datasets demonstrate that the proposed framework can outperform a state-of-the-art framework in terms of convergence speed.

ITApr 1, 2020
Scheduling for Cellular Federated Edge Learning with Importance and Channel Awareness

Jinke Ren, Yinghui He, Dingzhu Wen et al.

In cellular federated edge learning (FEEL), multiple edge devices holding local data jointly train a neural network by communicating learning updates with an access point without exchanging their data samples. With very limited communication resources, it is beneficial to schedule the most informative local learning updates. In this paper, a novel scheduling policy is proposed to exploit both diversity in multiuser channels and diversity in the "importance" of the edge devices' learning updates. First, a new probabilistic scheduling framework is developed to yield unbiased update aggregation in FEEL. The importance of a local learning update is measured by its gradient divergence. If one edge device is scheduled in each communication round, the scheduling policy is derived in closed form to achieve the optimal trade-off between channel quality and update importance. The probabilistic scheduling framework is then extended to allow scheduling multiple edge devices in each communication round. Numerical results obtained using popular models and learning datasets demonstrate that the proposed scheduling policy can achieve faster model convergence and higher learning accuracy than conventional scheduling policies that only exploit a single type of diversity.

ITNov 10, 2019
An Overview of Data-Importance Aware Radio Resource Management for Edge Machine Learning

Dingzhu Wen, Xiaoyang Li, Qunsong Zeng et al.

The 5G network connecting billions of Internet-of-Things (IoT) devices will make it possible to harvest an enormous amount of real-time mobile data. Furthermore, the 5G virtualization architecture will enable cloud computing at the (network) edge. The availability of both rich data and computation power at the edge has motivated Internet companies to deploy artificial intelligence (AI) there, creating the hot area of edge-AI. Edge learning, the theme of this project, concerns training edge-AI models, which endow on IoT devices intelligence for responding to real-time events. However, the transmission of high-dimensional data from many edge devices to servers can result in excessive communication latency, creating a bottleneck for edge learning. Traditional wireless techniques deigned for only radio access are ineffective in tackling the challenge. Attempts to overcome the communication bottleneck has led to the development of a new class of techniques for intelligent radio resource management (RRM), called data-importance aware RRM. Their designs feature the interplay of active machine learning and wireless communication. Specifically, the metrics that measure data importance in active learning (e.g., classification uncertainty and data diversity) are applied to RRM for efficient acquisition of distributed data in wireless networks to train AI models at servers. This article aims at providing an introduction to the emerging area of importance-aware RRM. To this end, we will introduce the design principles, survey recent advancements in the area, discuss some design examples, and suggest some promising research opportunities.

LGMay 23, 2019
Accelerating DNN Training in Wireless Federated Edge Learning Systems

Jinke Ren, Guanding Yu, Guangyao Ding

Training task in classical machine learning models, such as deep neural networks, is generally implemented at a remote cloud center for centralized learning, which is typically time-consuming and resource-hungry. It also incurs serious privacy issue and long communication latency since a large amount of data are transmitted to the centralized node. To overcome these shortcomings, we consider a newly-emerged framework, namely federated edge learning, to aggregate local learning updates at the network edge in lieu of users' raw data. Aiming at accelerating the training process, we first define a novel performance evaluation criterion, called learning efficiency. We then formulate a training acceleration optimization problem in the CPU scenario, where each user device is equipped with CPU. The closed-form expressions for joint batchsize selection and communication resource allocation are developed and some insightful results are highlighted. Further, we extend our learning framework to the GPU scenario. The optimal solution in this scenario is manifested to have the similar structure as that of the CPU scenario, recommending that our proposed algorithm is applicable in more general systems. Finally, extensive experiments validate the theoretical analysis and demonstrate that the proposed algorithm can reduce the training time and improve the learning accuracy simultaneously.