Xiong You

AI
h-index10
4papers
22citations
Novelty50%
AI Score43

4 Papers

CLMay 29Code
The Sword, Shield, and Achilles' Heel: Characterizing the Linguistic Inductive Bias of Large Language Models for Spatial Reasoning in Navigation Planning

Xudong Zhang, Jian Yang, Shengkai Wang et al.

Large Language Model (LLM)-based navigation systems commonly construct explicit spatial representations (e.g., topological graphs, semantic raster maps) and translate them into textual descriptions as LLMs' inputs. However, the linguistic structures of such text-based spatial representations and the choices of contextual features (e.g., topology, geometry) they contain are often treated as neutral engineering decisions rather than key factors that shape LLMs' behavior. To fill the gap, we propose a dual-interventional framework that disentangles linguistic structures from different contextual cues to evaluate the linguistic inductive bias of LLMs for navigation planning. In the framework, representation intervention varies the linguistic format and the degree of linguistic compression, clarifying when linguistic representations support or inhibit navigation planning. Context intervention, combined with contextual feature combination and conflict probing, explicitly clarifies the preferences and weaknesses of LLMs when processing different contextual cues. Experiments across diverse spatial reasoning tasks and multiple model scales reveal a consistent pattern: topological information is a sturdy shield and the backbone of robust planning; linguistic format is a double-edged sword whose effect depends on model size, task demands, and the compression level; and semantic information is a fatal Achilles' heel -- incorrect semantic cues can systematically derail the planning process. Overall, our study shows that effective text-based spatial representations in LLM-based navigation should preserve topological integrity, calibrate representational compression to model capacity, and ensure semantic correctness, rather than simply adopting a single representation. Our code is publicly available at https://github.com/jonesdong150/LLM-Navigation-Inductive-Bias.

AISep 6, 2025
MSRFormer: Road Network Representation Learning using Multi-scale Feature Fusion of Heterogeneous Spatial Interactions

Jian Yang, Jiahui Wu, Li Fang et al.

Transforming road network data into vector representations using deep learning has proven effective for road network analysis. However, urban road networks' heterogeneous and hierarchical nature poses challenges for accurate representation learning. Graph neural networks, which aggregate features from neighboring nodes, often struggle due to their homogeneity assumption and focus on a single structural scale. To address these issues, this paper presents MSRFormer, a novel road network representation learning framework that integrates multi-scale spatial interactions by addressing their flow heterogeneity and long-distance dependencies. It uses spatial flow convolution to extract small-scale features from large trajectory datasets, and identifies scale-dependent spatial interaction regions to capture the spatial structure of road networks and flow heterogeneity. By employing a graph transformer, MSRFormer effectively captures complex spatial dependencies across multiple scales. The spatial interaction features are fused using residual connections, which are fed to a contrastive learning algorithm to derive the final road network representation. Validation on two real-world datasets demonstrates that MSRFormer outperforms baseline methods in two road network analysis tasks. The performance gains of MSRFormer suggest the traffic-related task benefits more from incorporating trajectory data, also resulting in greater improvements in complex road network structures with up to 16% improvements compared to the most competitive baseline method. This research provides a practical framework for developing task-agnostic road network representation models and highlights distinct association patterns of the interplay between scale effects and flow heterogeneity of spatial interactions.

ROMay 2, 2025
NeuroLoc: Encoding Navigation Cells for 6-DOF Camera Localization

Xun Li, Jian Yang, Fenli Jia et al.

Recently, camera localization has been widely adopted in autonomous robotic navigation due to its efficiency and convenience. However, autonomous navigation in unknown environments often suffers from scene ambiguity, environmental disturbances, and dynamic object transformation in camera localization. To address this problem, inspired by the biological brain navigation mechanism (such as grid cells, place cells, and head direction cells), we propose a novel neurobiological camera location method, namely NeuroLoc. Firstly, we designed a Hebbian learning module driven by place cells to save and replay historical information, aiming to restore the details of historical representations and solve the issue of scene fuzziness. Secondly, we utilized the head direction cell-inspired internal direction learning as multi-head attention embedding to help restore the true orientation in similar scenes. Finally, we added a 3D grid center prediction in the pose regression module to reduce the final wrong prediction. We evaluate the proposed NeuroLoc on commonly used benchmark indoor and outdoor datasets. The experimental results show that our NeuroLoc can enhance the robustness in complex environments and improve the performance of pose regression by using only a single image.

CVMay 17, 2023
Integrating Multiple Sources Knowledge for Class Asymmetry Domain Adaptation Segmentation of Remote Sensing Images

Kuiliang Gao, Anzhu Yu, Xiong You et al.

In the existing unsupervised domain adaptation (UDA) methods for remote sensing images (RSIs) semantic segmentation, class symmetry is an widely followed ideal assumption, where the source and target RSIs have exactly the same class space. In practice, however, it is often very difficult to find a source RSI with exactly the same classes as the target RSI. More commonly, there are multiple source RSIs available. To this end, a novel class asymmetry RSIs domain adaptation method with multiple sources is proposed in this paper, which consists of four key components. Firstly, a multi-branch segmentation network is built to learn an expert for each source RSI. Secondly, a novel collaborative learning method with the cross-domain mixing strategy is proposed, to supplement the class information for each source while achieving the domain adaptation of each source-target pair. Thirdly, a pseudo-label generation strategy is proposed to effectively combine strengths of different experts, which can be flexibly applied to two cases where the source class union is equal to or includes the target class set. Fourthly, a multiview-enhanced knowledge integration module is developed for the high-level knowledge routing and transfer from multiple domains to target predictions.