Yu Ning

CVDec 2, 2025

Video4Spatial: Towards Visuospatial Intelligence with Context-Guided Video Generation

Zeqi Xiao, Yiwei Zhao, Lingxiao Li et al.

We investigate whether video generative models can exhibit visuospatial intelligence, a capability central to human cognition, using only visual data. To this end, we present Video4Spatial, a framework showing that video diffusion models conditioned solely on video-based scene context can perform complex spatial tasks. We validate on two tasks: scene navigation - following camera-pose instructions while remaining consistent with 3D geometry of the scene, and object grounding - which requires semantic localization, instruction following, and planning. Both tasks use video-only inputs, without auxiliary modalities such as depth or poses. With simple yet effective design choices in the framework and data curation, Video4Spatial demonstrates strong spatial understanding from video context: it plans navigation and grounds target objects end-to-end, follows camera-pose instructions while maintaining spatial consistency, and generalizes to long contexts and out-of-domain environments. Taken together, these results advance video generative models toward general visuospatial reasoning.

1.0NIApr 27

A method for detecting spatio-temporal correlation anomalies of WSN nodes based on topological information enhancement and time-frequency feature extraction

Miao Ye, Ziheng Wang, Qiuxiang Jiang et al.

Existing anomaly detection methods for Wireless Sensor Networks (WSNs) generally suffer from insufficient extraction of spatio-temporal correlation features, reliance on either timedomain or frequencydomain information alone, and high computational overhead. To address these limitations, this paper proposes a topology-enhanced spatio-temporal feature fusion anomaly detection method, TE-MSTAD. First, building upon the RWKV model with linear attention mechanisms, a Cross modal Feature Extraction (CFE) module is introduced to fully extract spatial correlation features among multiple nodes while reducing computational resource consumption. Second, a strategy is designed to construct an adjacency matrix by jointly learning spatial correlation from time-frequency domain features. Different graph neural networks are integrated to enhance spatial correlation feature extraction, thereby fully capturing spatial relationships among multiple nodes. Finally, a dualbranch network TE-MSTAD is designed for time-frequency domain feature fusion, overcoming the limitations of relying solely on the time or frequency domain to improve WSN anomaly detection performance. Testing on both public and realworld datasets demonstrates that the TE-MSTAD model achieves F1 scores of 92.52% and 93.28%, respectively, exhibiting superior detection performance and generalization capabilities compared to existing methods.

Yu Ning

2 Papers