Mugen Peng

h-index65

8papers

751citations

Novelty33%

AI Score37

Ranked #93,871 of 194,257 authors (top 48%)#31,537 in CV (top 53%)

8 Papers

5.1ITFeb 13, 2023

Multi-Carrier NOMA-Empowered Wireless Federated Learning with Optimal Power and Bandwidth Allocation

Weicai Li, Tiejun Lv, Yashuai Cao et al.

Wireless federated learning (WFL) undergoes a communication bottleneck in uplink, limiting the number of users that can upload their local models in each global aggregation round. This paper presents a new multi-carrier non-orthogonal multiple-access (MC-NOMA)-empowered WFL system under an adaptive learning setting of Flexible Aggregation. Since a WFL round accommodates both local model training and uploading for each user, the use of Flexible Aggregation allows the users to train different numbers of iterations per round, adapting to their channel conditions and computing resources. The key idea is to use MC-NOMA to concurrently upload the local models of the users, thereby extending the local model training times of the users and increasing participating users. A new metric, namely, Weighted Global Proportion of Trained Mini-batches (WGPTM), is analytically established to measure the convergence of the new system. Another important aspect is that we maximize the WGPTM to harness the convergence of the new system by jointly optimizing the transmit powers and subchannel bandwidths. This nonconvex problem is converted equivalently to a tractable convex problem and solved efficiently using variable substitution and Cauchy's inequality. As corroborated experimentally using a convolutional neural network and an 18-layer residential network, the proposed MC-NOMA WFL can efficiently reduce communication delay, increase local model training times, and accelerate the convergence by over 40%, compared to its existing alternative.

12.1CVDec 30, 2024Code

UniRS: Unifying Multi-temporal Remote Sensing Tasks through Vision Language Models

Yujie Li, Wenjia Xu, Guangzuo Li et al.

The domain gap between remote sensing imagery and natural images has recently received widespread attention and Vision-Language Models (VLMs) have demonstrated excellent generalization performance in remote sensing multimodal tasks. However, current research is still limited in exploring how remote sensing VLMs handle different types of visual inputs. To bridge this gap, we introduce \textbf{UniRS}, the first vision-language model \textbf{uni}fying multi-temporal \textbf{r}emote \textbf{s}ensing tasks across various types of visual input. UniRS supports single images, dual-time image pairs, and videos as input, enabling comprehensive remote sensing temporal analysis within a unified framework. We adopt a unified visual representation approach, enabling the model to accept various visual inputs. For dual-time image pair tasks, we customize a change extraction module to further enhance the extraction of spatiotemporal features. Additionally, we design a prompt augmentation mechanism tailored to the model's reasoning process, utilizing the prior knowledge of the general-purpose VLM to provide clues for UniRS. To promote multi-task knowledge sharing, the model is jointly fine-tuned on a mixed dataset. Experimental results show that UniRS achieves state-of-the-art performance across diverse tasks, including visual question answering, change captioning, and video scene classification, highlighting its versatility and effectiveness in unifying these multi-temporal remote sensing tasks. Our code and dataset will be released soon.

16.8CVMay 20, 2024Code

UAV-VisLoc: A Large-scale Dataset for UAV Visual Localization

Wenjia Xu, Yaxuan Yao, Jiaqi Cao et al.

The application of unmanned aerial vehicles (UAV) has been widely extended recently. It is crucial to ensure accurate latitude and longitude coordinates for UAVs, especially when the global navigation satellite systems (GNSS) are disrupted and unreliable. Existing visual localization methods achieve autonomous visual localization without error accumulation by matching the ground-down view image of UAV with the ortho satellite maps. However, collecting UAV ground-down view images across diverse locations is costly, leading to a scarcity of large-scale datasets for real-world scenarios. Existing datasets for UAV visual localization are often limited to small geographic areas or are focused only on urban regions with distinct textures. To address this, we define the UAV visual localization task by determining the UAV's real position coordinates on a large-scale satellite map based on the captured ground-down view. In this paper, we present a large-scale dataset, UAV-VisLoc, to facilitate the UAV visual localization task. This dataset comprises images from diverse drones across 11 locations in China, capturing a range of topographical features. The dataset features images from fixed-wing drones and multi-terrain drones, captured at different altitudes and orientations. Our dataset includes 6,742 drone images and 11 satellite maps, with metadata such as latitude, longitude, altitude, and capture date. Our dataset is tailored to support both the training and testing of models by providing a diverse and extensive data.

7.6CVFeb 3, 2024Code

Deep Semantic-Visual Alignment for Zero-Shot Remote Sensing Image Scene Classification

Wenjia Xu, Jiuniu Wang, Zhiwei Wei et al.

Deep neural networks have achieved promising progress in remote sensing (RS) image classification, for which the training process requires abundant samples for each class. However, it is time-consuming and unrealistic to annotate labels for each RS category, given the fact that the RS target database is increasing dynamically. Zero-shot learning (ZSL) allows for identifying novel classes that are not seen during training, which provides a promising solution for the aforementioned problem. However, previous ZSL models mainly depend on manually-labeled attributes or word embeddings extracted from language models to transfer knowledge from seen classes to novel classes. Besides, pioneer ZSL models use convolutional neural networks pre-trained on ImageNet, which focus on the main objects appearing in each image, neglecting the background context that also matters in RS scene classification. To address the above problems, we propose to collect visually detectable attributes automatically. We predict attributes for each class by depicting the semantic-visual similarity between attributes and images. In this way, the attribute annotation process is accomplished by machine instead of human as in other methods. Moreover, we propose a Deep Semantic-Visual Alignment (DSVA) that take advantage of the self-attention mechanism in the transformer to associate local image regions together, integrating the background context information for prediction. The DSVA model further utilizes the attribute attention maps to focus on the informative image regions that are essential for knowledge transfer in ZSL, and maps the visual images into attribute space to perform ZSL classification. With extensive experiments, we show that our model outperforms other state-of-the-art models by a large margin on a challenging large-scale RS scene classification benchmark.

10.2CVSep 7, 2025

BTCChat: Advancing Remote Sensing Bi-temporal Change Captioning with Multimodal Large Language Model

Yujie Li, Wenjia Xu, Yuanben Zhang et al.

Bi-temporal satellite imagery supports critical applications such as urban development monitoring and disaster assessment. Although powerful multimodal large language models (MLLMs) have been applied in bi-temporal change analysis, previous methods process image pairs through direct concatenation, inadequately modeling temporal correlations and spatial semantic changes. This deficiency hampers visual-semantic alignment in change understanding, thereby constraining the overall effectiveness of current approaches. To address this gap, we propose BTCChat, a multi-temporal MLLM with advanced bi-temporal change understanding capability. BTCChat supports bi-temporal change captioning and retains single-image interpretation capability. To better capture temporal features and spatial semantic changes in image pairs, we design a Change Extraction module. Moreover, to enhance the model's attention to spatial details, we introduce a Prompt Augmentation mechanism, which incorporates contextual clues into the prompt to enhance model performance. Experimental results demonstrate that BTCChat achieves state-of-the-art performance on change captioning and visual question answering tasks.

5.9DCMay 8, 2021

Blockchain Systems, Technologies and Applications: A Methodology Perspective

Bin Cao, Zixin Wang, Long Zhang et al.

In the past decade, blockchain has shown a promising vision greatly to build the trust without any powerful third party in a secure, decentralized and salable manner. However, due to the wide application and future development from cryptocurrency to Internet of Things, blockchain is an extremely complex system enabling integration with mathematics, finance, computer science, communication and network engineering, etc. As a result, it is a challenge for engineer, expert and researcher to fully understand the blockchain process in a systematic view from top to down. First, this article introduces how blockchain works, the research activity and challenge, and illustrates the roadmap involving the classic methodology with typical blockchain use cases and topics. Second, in blockchain system, how to adopt stochastic process, game theory, optimization, machine learning and cryptography to study blockchain running process and design blockchain protocol/algorithm are discussed in details. Moreover, the advantage and limitation using these methods are also summarized as the guide of future work to further considered. Finally, some remaining problems from technical, commercial and political views are discussed as the open issues. The main findings of this article will provide an overview in a methodology perspective to study theoretical model for blockchain fundamentals understanding, design network service for blockchain-based mechanisms and algorithms, as well as apply blockchain for Internet of Things, etc.

16.0CRApr 27, 2021

Secure and Efficient Federated Learning Through Layering and Sharding Blockchain

Shuo Yuan, Bin Cao, Yao Sun et al.

Introducing blockchain into Federated Learning (FL) to build a trusted edge computing environment for transmission and learning has attracted widespread attention as a new decentralized learning pattern. However, traditional consensus mechanisms and architectures of blockchain systems face significant challenges in handling large-scale FL tasks, especially on Internet of Things (IoT) devices, due to their substantial resource consumption, limited transaction throughput, and complex communication requirements. To address these challenges, this paper proposes ChainFL, a novel two-layer blockchain-driven FL system. It splits the IoT network into multiple shards within the subchain layer, effectively reducing the scale of information exchange, and employs a Direct Acyclic Graph (DAG)-based mainchain as the mainchain layer, enabling parallel and asynchronous cross-shard validation. Furthermore, the FL procedure is customized to integrate deeply with blockchain technology, and a modified DAG consensus mechanism is designed to mitigate distortion caused by abnormal models. To provide a proof-of-concept implementation and evaluation, multiple subchains based on Hyperledger Fabric and a self-developed DAG-based mainchain are deployed. Extensive experiments demonstrate that ChainFL significantly surpasses conventional FL systems, showing up to a 14% improvement in training efficiency and a threefold increase in robustness.

21.0NISep 24, 2018

Application of Machine Learning in Wireless Networks: Key Techniques and Open Issues

Yaohua Sun, Mugen Peng, Yangcheng Zhou et al.

As a key technique for enabling artificial intelligence, machine learning (ML) is capable of solving complex problems without explicit programming. Motivated by its successful applications to many practical tasks like image recognition, both industry and the research community have advocated the applications of ML in wireless communication. This paper comprehensively surveys the recent advances of the applications of ML in wireless communication, which are classified as: resource management in the MAC layer, networking and mobility management in the network layer, and localization in the application layer. The applications in resource management further include power control, spectrum management, backhaul management, cache management, beamformer design and computation resource management, while ML based networking focuses on the applications in clustering, base station switching control, user association and routing. Moreover, literatures in each aspect is organized according to the adopted ML techniques. In addition, several conditions for applying ML to wireless communication are identified to help readers decide whether to use ML and which kind of ML techniques to use, and traditional approaches are also summarized together with their performance comparison with ML based approaches, based on which the motivations of surveyed literatures to adopt ML are clarified. Given the extensiveness of the research area, challenges and unresolved issues are presented to facilitate future studies, where ML based network slicing, infrastructure update to support ML based paradigms, open data sets and platforms for researchers, theoretical guidance for ML implementation and so on are discussed.