PFMar 31
Closed-Loop Integrated Sensing, Communication, and Control for Efficient Drone FlightJingli Li, Yiyan Ma, Bo Ai et al.
Low-altitude wireless networks (LAWN) require drones to follow specific trajectories controlled by ground base stations (GBSs). However, given complex low-altitude channel conditions and limited spectrum and power resources, sensing errors and wireless link unreliability cannot be ignored, leading to trajectory deviations that threaten flight safety. To address this issue, this paper proposes an integrated sensing-communication-control (ISCC) closed-loop trajectory tracking approach, aiming to reveal the coupling mechanisms among communication, sensing, and control during drone flight. In detail, we incorporate sensing errors in trajectory state estimation, packet losses in control command transmission, and finite blocklength transmission effects into the closed-loop dynamics. First, through theoretical analysis, we identify the dominant role of the time-frequency resources allocated to control in ensuring system stability and derive a lower bound on the resources required to guarantee stable operation. Second, to minimize tracking error, we formulate a time-frequency resource allocation optimization problem for the sensing, communication, and control components, subject to constraints on communication rate and closed-loop stability. Accordingly, a solution algorithm based on successive convex approximation is proposed. Third, simulation results indicate that once stability is ensured, system performance is primarily determined by sensing accuracy, with the trajectory tracking error exhibiting an approximately linear dependence on the position error bound. Finally, it is shown that the proposed ISCC scheme avoids trajectory divergence under FBL transmission compared with ISCC designs ignoring control packet loss, and could achieve decimeter-level average tracking accuracy, reducing the error to only 17.37% of that observed in the baseline global navigation satellite system scheme.
CVApr 2
Environment-Aware Channel Prediction for Vehicular Communications: A Multimodal Visual Feature Fusion FrameworkXuejian Zhang, Ruisi He, Minseok Kim et al.
The deep integration of communication with intelligence and sensing, as a defining vision of 6G, renders environment-aware channel prediction a key enabling technology. As a representative 6G application, vehicular communications require accurate and forward-looking channel prediction under stringent reliability, latency, and adaptability demands. Traditional empirical and deterministic models remain limited in balancing accuracy, generalization, and deployability, while the growing availability of onboard and roadside sensing devices offers a promising source of environmental priors. This paper proposes an environment-aware channel prediction framework based on multimodal visual feature fusion. Using GPS data and vehicle-side panoramic RGB images, together with semantic segmentation and depth estimation, the framework extracts semantic, depth, and position features through a three-branch architecture and performs adaptive multimodal fusion via a squeeze-excitation attention gating module. For 360-dimensional angular power spectrum (APS) prediction, a dedicated regression head and a composite multi-constraint loss are further designed. As a result, joint prediction of path loss (PL), delay spread (DS), azimuth spread of arrival (ASA), azimuth spread of departure (ASD), and APS is achieved. Experiments on a synchronized urban V2I measurement dataset yield the best root mean square error (RMSE) of 3.26 dB for PL, RMSEs of 37.66 ns, 5.05 degrees, and 5.08 degrees for DS, ASA, and ASD, respectively, and mean/median APS cosine similarities of 0.9342/0.9571, demonstrating strong accuracy, generalization, and practical potential for intelligent channel prediction in 6G vehicular communications.
ITOct 31, 2024
COST CA20120 INTERACT Framework of Artificial Intelligence Based Channel ModelingRuisi He, Nicola D. Cicco, Bo Ai et al.
Accurate channel models are the prerequisite for communication-theoretic investigations as well as system design. Channel modeling generally relies on statistical and deterministic approaches. However, there are still significant limits for the traditional modeling methods in terms of accuracy, generalization ability, and computational complexity. The fundamental reason is that establishing a quantified and accurate mapping between physical environment and channel characteristics becomes increasing challenging for modern communication systems. Here, in the context of COST CA20120 Action, we evaluate and discuss the feasibility and implementation of using artificial intelligence (AI) for channel modeling, and explore where the future of this field lies. Firstly, we present a framework of AI-based channel modeling to characterize complex wireless channels. Then, we highlight in detail some major challenges and present the possible solutions: i) estimating the uncertainty of AI-based channel predictions, ii) integrating prior knowledge of propagation to improve generalization capabilities, and iii) interpretable AI for channel modeling. We present and discuss illustrative numerical results to showcase the capabilities of AI-based channel modeling.
CVJan 25, 2025
Vision Aided Channel Prediction for Vehicular Communications: A Case Study of Received Power Prediction Using RGB ImagesXuejian Zhang, Ruisi He, Mi Yang et al.
The communication scenarios and channel characteristics of 6G will be more complex and difficult to characterize. Conventional methods for channel prediction face challenges in achieving an optimal balance between accuracy, practicality, and generalizability. Additionally, they often fail to effectively leverage environmental features. Within the framework of integration communication and artificial intelligence as a pivotal development vision for 6G, it is imperative to achieve intelligent prediction of channel characteristics. Vision-aided methods have been employed in various wireless communication tasks, excluding channel prediction, and have demonstrated enhanced efficiency and performance. In this paper, we propose a vision-aided two-stage model for channel prediction in millimeter wave vehicular communication scenarios, realizing accurate received power prediction utilizing solely RGB images. Firstly, we obtain original images of propagation environment through an RGB camera. Secondly, three typical computer vision methods including object detection, instance segmentation and binary mask are employed for environmental information extraction from original images in stage 1, and prediction of received power based on processed images is implemented in stage 2. Pre-trained YOLOv8 and ResNets are used in stages 1 and 2, respectively, and fine-tuned on datasets. Finally, we conduct five experiments to evaluate the performance of proposed model, demonstrating its feasibility, accuracy and generalization capabilities. The model proposed in this paper offers novel solutions for achieving intelligent channel prediction in vehicular communications.
CVDec 17, 2025
Step-GUI Technical ReportHaolong Yan, Jia Wang, Xin Huang et al.
Recent advances in multimodal large language models unlock unprecedented opportunities for GUI automation. However, a fundamental challenge remains: how to efficiently acquire high-quality training data while maintaining annotation reliability? We introduce a self-evolving training pipeline powered by the Calibrated Step Reward System, which converts model-generated trajectories into reliable training signals through trajectory-level calibration, achieving >90% annotation accuracy with 10-100x lower cost. Leveraging this pipeline, we introduce Step-GUI, a family of models (4B/8B) that achieves state-of-the-art GUI performance (8B: 80.2% AndroidWorld, 48.5% OSWorld, 62.6% ScreenShot-Pro) while maintaining robust general capabilities. As GUI agent capabilities improve, practical deployment demands standardized interfaces across heterogeneous devices while protecting user privacy. To this end, we propose GUI-MCP, the first Model Context Protocol for GUI automation with hierarchical architecture that combines low-level atomic operations and high-level task delegation to local specialist models, enabling high-privacy execution where sensitive data stays on-device. Finally, to assess whether agents can handle authentic everyday usage, we introduce AndroidDaily, a benchmark grounded in real-world mobile usage patterns with 3146 static actions and 235 end-to-end tasks across high-frequency daily scenarios (8B: static 89.91%, end-to-end 52.50%). Our work advances the development of practical GUI agents and demonstrates strong potential for real-world deployment in everyday digital interactions.
SPMar 3, 2025
A CGAN-LSTM-Based Framework for Time-Varying Non-Stationary Channel ModelingKeying Guo, Ruisi He, Mi Yang et al.
Time-varying non-stationary channels, with complex dynamic variations and temporal evolution characteristics, have significant challenges in channel modeling and communication system performance evaluation. Most existing methods of time-varying channel modeling focus on predicting channel state at a given moment or simulating short-term channel fluctuations, which are unable to capture the long-term evolution of the channel. This paper emphasizes the generation of long-term dynamic channel to fully capture evolution of non-stationary channel properties. The generated channel not only reflects temporal dynamics but also ensures consistent stationarity. We propose a hybrid deep learning framework that combines conditional generative adversarial networks (CGAN) with long short-term memory (LSTM) networks. A stationarity-constrained approach is designed to ensure temporal correlation of the generated time-series channel. This method can generate channel with required temporal non-stationarity. The model is validated by comparing channel statistical features, and the results show that the generated channel is in good agreement with raw channel and provides good performance in terms of non-stationarity.