CLDec 15, 2022Code
Visually-augmented pretrained language models for NLP tasks without imagesHangyu Guo, Kun Zhou, Wayne Xin Zhao et al.
Although pre-trained language models~(PLMs) have shown impressive performance by text-only self-supervised training, they are found lack of visual semantics or commonsense. Existing solutions often rely on explicit images for visual knowledge augmentation (requiring time-consuming retrieval or generation), and they also conduct the augmentation for the whole input text, without considering whether it is actually needed in specific inputs or tasks. To address these issues, we propose a novel \textbf{V}isually-\textbf{A}ugmented fine-tuning approach that can be generally applied to various PLMs or NLP tasks, \textbf{W}ithout using any retrieved or generated \textbf{I}mages, namely \textbf{VAWI}. Experimental results show that our approach can consistently improve the performance of BERT, RoBERTa, BART, and T5 at different scales, and outperform several competitive baselines on ten tasks. Our codes and data are publicly available at~\url{https://github.com/RUCAIBox/VAWI}.
SYMay 3, 2022
Real-time Cooperative Vehicle Coordination at Unsignalized Road IntersectionsJiping Luo, Tingting Zhang, Rui Hao et al.
Cooperative coordination at unsignalized road intersections, which aims to improve the driving safety and traffic throughput for connected and automated vehicles, has attracted increasing interests in recent years. However, most existing investigations either suffer from computational complexity or cannot harness the full potential of the road infrastructure. To this end, we first present a dedicated intersection coordination framework, where the involved vehicles hand over their control authorities and follow instructions from a centralized coordinator. Then a unified cooperative trajectory optimization problem will be formulated to maximize the traffic throughput while ensuring the driving safety and long-term stability of the coordination system. To address the key computational challenges in the real-world deployment, we reformulate this non-convex sequential decision problem into a model-free Markov Decision Process (MDP) and tackle it by devising a Twin Delayed Deep Deterministic Policy Gradient (TD3)-based strategy in the deep reinforcement learning (DRL) framework. Simulation and practical experiments show that the proposed strategy could achieve near-optimal performance in sub-static coordination scenarios and significantly improve the traffic throughput in the realistic continuous traffic flow. The most remarkable advantage is that our strategy could reduce the time complexity of computation to milliseconds, and is shown scalable when the road lanes increase.
ITDec 17, 2024
Distributed satellite information networks: Architecture, enabling technologies, and trendsQinyu Zhang, Liang Xu, Jianhao Huang et al.
Driven by the vision of ubiquitous connectivity and wireless intelligence, the evolution of ultra-dense constellation-based satellite-integrated Internet is underway, now taking preliminary shape. Nevertheless, the entrenched institutional silos and limited, nonrenewable heterogeneous network resources leave current satellite systems struggling to accommodate the escalating demands of next-generation intelligent applications. In this context, the distributed satellite information networks (DSIN), exemplified by the cohesive clustered satellites system, have emerged as an innovative architecture, bridging information gaps across diverse satellite systems, such as communication, navigation, and remote sensing, and establishing a unified, open information network paradigm to support resilient space information services. This survey first provides a profound discussion about innovative network architectures of DSIN, encompassing distributed regenerative satellite network architecture, distributed satellite computing network architecture, and reconfigurable satellite formation flying, to enable flexible and scalable communication, computing and control. The DSIN faces challenges from network heterogeneity, unpredictable channel dynamics, sparse resources, and decentralized collaboration frameworks. To address these issues, a series of enabling technologies is identified, including channel modeling and estimation, cloud-native distributed MIMO cooperation, grant-free massive access, network routing, and the proper combination of all these diversity techniques. Furthermore, to heighten the overall resource efficiency, the cross-layer optimization techniques are further developed to meet upper-layer deterministic, adaptive and secure information services requirements. In addition, emerging research directions and new opportunities are highlighted on the way to achieving the DSIN vision.
NIApr 5
UAV Control and Communication Enabled Low-Altitude Economy: Challenges, Resilient Architecture and Co-design StrategiesTianhao Liang, Nanchi Su, Yuqi Ping et al.
The emerging low-altitude economy has catalyzed the large-scale deployment of unmanned aerial vehicles (UAVs), driving a paradigm shift in environment monitoring, logistics, and emergency response. However, operating within these environments presents notable challenges as pervasive coverage holes, unpredictable interference, and spectrum scarcity. To this end, this article present a communication and control co-design framework to enable a resilient architecture for cellular-connected UAVs. Specifically, we first characterize typical service applications and their stringent performance requirements, followed by a comprehensive analysis of the unique challenges. To bridge the gap between volatile wireless links and rigid flight stability, a three layered architecture is proposed, integrating pre-flight strategic planning, in-flight adaptive action, and system-level resource orchestration. Furthermore, we detail the key enabling technologies for communication and control co-design. Preliminary case studies are proposed to validate that the co-design framework significantly improve the resilience of cellular-connected UAV systems, providing a robust foundation for the evolution of intelligent low-altitude networks.
SYSep 8, 2025
Enhancing Low-Altitude Airspace Security: MLLM-Enabled UAV Intent RecognitionGuangyu Lei, Tianhao Liang, Yuqi Ping et al.
The rapid development of the low-altitude economy emphasizes the critical need for effective perception and intent recognition of non-cooperative unmanned aerial vehicles (UAVs). The advanced generative reasoning capabilities of multimodal large language models (MLLMs) present a promising approach in such tasks. In this paper, we focus on the combination of UAV intent recognition and the MLLMs. Specifically, we first present an MLLM-enabled UAV intent recognition architecture, where the multimodal perception system is utilized to obtain real-time payload and motion information of UAVs, generating structured input information, and MLLM outputs intent recognition results by incorporating environmental information, prior knowledge, and tactical preferences. Subsequently, we review the related work and demonstrate their progress within the proposed architecture. Then, a use case for low-altitude confrontation is conducted to demonstrate the feasibility of our architecture and offer valuable insights for practical system design. Finally, the future challenges are discussed, followed by corresponding strategic recommendations for further applications.
RODec 18, 2024
Energy-Efficient SLAM via Joint Design of Sensing, Communication, and Exploration SpeedZidong Han, Ruibo Jin, Xiaoyang Li et al.
To support future spatial machine intelligence applications, lifelong simultaneous localization and mapping (SLAM) has drawn significant attentions. SLAM is usually realized based on various types of mobile robots performing simultaneous and continuous sensing and communication. This paper focuses on analyzing the energy efficiency of robot operation for lifelong SLAM by jointly considering sensing, communication and mechanical factors. The system model is built based on a robot equipped with a 2D light detection and ranging (LiDAR) and an odometry. The cloud point raw data as well as the odometry data are wirelessly transmitted to data center where real-time map reconstruction is realized based on an unsupervised deep learning based method. The sensing duration, transmit power, transmit duration and exploration speed are jointly optimized to minimize the energy consumption. Simulations and experiments demonstrate the performance of our proposed method.
DCApr 1
Twinning for Space-Air-Ground-Sea Integrated Networks: Beyond Conventional Digital Twin Towards Goal-Oriented Semantic TwinYifei Qiu, Tianle Liao, Xin Jin et al.
A space-air-ground-sea integrated network (SAGSIN) has emerged as a cornerstone of 6G systems, establishing a unified global architecture by integrating multi-domain network resources. Motivated by the demand for real-time situational awareness and intelligent operational maintenance, digital twin (DT) technology was initially regarded as a promising solution, owing to its capability to create virtual replicas and emulate physical system behaviors. However, in the context of SAGSIN, the high-fidelity, full-scale modeling paradigm inherent to conventional DTs encounters fundamental limitations, including prohibitive computational overhead, delayed model synchronization, and cross-system semantic gaps. To address these limitations, this survey paper proposes a novel twinning framework: goal-oriented semantic twin (GOST). Unlike DTs that pursue physical mirroring, GOST prioritizes ``utility'' over ``fidelity,'' leveraging semantic technologies and goal-oriented principles to construct lightweight, task-specific representations. This paper systematically articulates the GOST framework through three layers: knowledge-based semantics, data-driven semantics, and goal-oriented principles. Furthermore, we provide a comprehensive tutorial on constructing GOST by detailing its core enabling technologies and introduce a multidimensional evaluation framework for GOST. We present a case study targeting collaborative tracking tasks in remote satellite-UAV networks, demonstrating that GOST significantly outperforms conventional DTs in timeliness of perceptual data and collaborative tracking. Finally, we outline research directions, establishing GOST as a transformative twinning paradigm to guide the development of SAGSIN.
ITSep 15, 2025
Task-Agnostic Learnable Weighted-Knowledge Base Scheme for Robust Semantic CommunicationsShiyao Jiang, Jian Jiao, Xingjian Zhang et al.
With the emergence of diverse and massive data in the upcoming sixth-generation (6G) networks, the task-agnostic semantic communication system is regarded to provide robust intelligent services. In this paper, we propose a task-agnostic learnable weighted-knowledge base semantic communication (TALSC) framework for robust image transmission to address the real-world heterogeneous data bias in KB, including label flipping noise and class imbalance. The TALSC framework incorporates a sample confidence module (SCM) as meta-learner and the semantic coding networks as learners. The learners are updated based on the empirical knowledge provided by the learnable weighted-KB (LW-KB). Meanwhile, the meta-learner evaluates the significance of samples according to the task loss feedback, and adjusts the update strategy of learners to enhance the robustness in semantic recovery for unknown tasks. To strike a balance between SCM parameters and precision of significance evaluation, we design an SCM-grid extension (SCM-GE) approach by embedding the Kolmogorov-Arnold networks (KAN) within SCM, which leverages the concept of spline refinement in KAN and enables scalable SCM with customizable granularity without retraining. Simulations demonstrate that the TALSC framework effectively mitigates the effects of flipping noise and class imbalance in task-agnostic image semantic communication, achieving at least 12% higher semantic recovery accuracy (SRA) and multi-scale structural similarity (MS-SSIM) compared to state-of-the-art methods.