CVMar 18, 2024Code
Hierarchical Spatial Proximity Reasoning for Vision-and-Language NavigationMing Xu, Zilong Xie
Most Vision-and-Language Navigation (VLN) algorithms are prone to making inaccurate decisions due to their lack of visual common sense and limited reasoning capabilities. To address this issue, we propose a Hierarchical Spatial Proximity Reasoning (HSPR) method. First, we introduce a scene understanding auxiliary task to help the agent build a knowledge base of hierarchical spatial proximity. This task utilizes panoramic views and object features to identify types of nodes and uncover the adjacency relationships between nodes, objects, and between nodes and objects. Second, we propose a multi-step reasoning navigation algorithm based on the hierarchical spatial proximity knowledge base, which continuously plans feasible paths to enhance exploration efficiency. Third, we introduce a residual fusion method to improve navigation decision accuracy. Finally, we validate our approach with experiments on publicly available datasets including REVERIE, SOON, R2R, and R4R. Our code is available at https://github.com/iCityLab/HSPR
LGApr 27, 2025
HetGL2R: Learning to Rank Critical Road Segments via Attributed Heterogeneous Graph Random WalksMing Xu, Jinrong Xiang, Zilong Xie et al.
Accurately identifying critical nodes with high spatial influence in road networks is essential for enhancing the efficiency of traffic management and urban planning. However, existing node importance ranking methods mainly rely on structural features and topological information, often overlooking critical factors such as origin-destination (OD) demand and route information. This limitation leaves considerable room for improvement in ranking accuracy. To address this issue, we propose HetGL2R, an attributed heterogeneous graph learning approach for ranking node importance in road networks. This method introduces a tripartite graph (trip graph) to model the structure of the road network, integrating OD demand, route choice, and various structural features of road segments. Based on the trip graph, we design an embedding method to learn node representations that reflect the spatial influence of road segments. The method consists of a heterogeneous random walk sampling algorithm (HetGWalk) and a Transformer encoder. HetGWalk constructs multiple attribute-guided graphs based on the trip graph to enrich the diversity of semantic associations between nodes. It then applies a joint random walk mechanism to convert both topological structures and node attributes into sequences, enabling the encoder to capture spatial dependencies more effectively among road segments. Finally, a listwise ranking strategy is employed to evaluate node importance. To validate the performance of our method, we construct two synthetic datasets using SUMO based on simulated road networks. Experimental results demonstrate that HetGL2R significantly outperforms baselines in incorporating OD demand and route choice information, achieving more accurate and robust node ranking. Furthermore, we conduct a case study using real-world taxi trajectory data from Beijing, further verifying the practicality of the proposed method.
CLNov 1, 2024
E2E-AFG: An End-to-End Model with Adaptive Filtering for Retrieval-Augmented GenerationYun Jiang, Zilong Xie, Wei Zhang et al.
Retrieval-augmented generation methods often neglect the quality of content retrieved from external knowledge bases, resulting in irrelevant information or potential misinformation that negatively affects the generation results of large language models. In this paper, we propose an end-to-end model with adaptive filtering for retrieval-augmented generation (E2E-AFG), which integrates answer existence judgment and text generation into a single end-to-end framework. This enables the model to focus more effectively on relevant content while reducing the influence of irrelevant information and generating accurate answers. We evaluate E2E-AFG on six representative knowledge-intensive language datasets, and the results show that it consistently outperforms baseline models across all tasks, demonstrating the effectiveness and robustness of the proposed approach.