56.2CVMay 3Code
Multi-Scale Gaussian-Language Map for Zero-shot Embodied Navigation and ReasoningSixian Zhang, Yiyao Wang, Xinhang Song et al.
Understanding the geometric and semantic structure of environments is essential for embodied navigation and reasoning. Existing semantic mapping methods trade off between explicit geometry and multi-scale semantics, and lack a native interface for large models, thus requiring additional training of feature projection for semantic alignment. To this end, we propose the multi-scale Gaussian-Language Map (GLMap), which introduces three key designs: (1) explicit geometry, (2) multi-scale semantics covering both instance and region concepts, and (3) a dual-modality interface where each semantic unit jointly stores a natural language description and a 3D Gaussian representation. The 3D Gaussians enable compact storage and fast rendering of task-relevant images via Gaussian splatting. To enable efficient incremental construction, we further propose a Gaussian Estimator that analytically derives Gaussian parameters from dense point clouds without gradient-based optimization. Experiments on ObjectNav, InstNav, and SQA tasks show that GLMap effectively enhances target navigation and contextual reasoning, while remaining compatible with large-model-based methods in a zero-shot manner. The code is available at https://github.com/sx-zhang/GLMap.
56.2LGMar 10Code
DT-BEHRT: Disease Trajectory-aware Transformer for Interpretable Patient Representation LearningDeyi Li, Zijun Yao, Qi Xu et al.
The growing adoption of electronic health record (EHR) systems has provided unprecedented opportunities for predictive modeling to guide clinical decision making. Structured EHRs contain longitudinal observations of patients across hospital visits, where each visit is represented by a set of medical codes. While sequence-based, graph-based, and graph-enhanced sequence approaches have been developed to capture rich code interactions over time or within the same visits, they often overlook the inherent heterogeneous roles of medical codes arising from distinct clinical characteristics and contexts. To this end, in this study we propose the Disease Trajectory-aware Transformer for EHR (DT-BEHRT), a graph-enhanced sequential architecture that disentangles disease trajectories by explicitly modeling diagnosis-centric interactions within organ systems and capturing asynchronous progression patterns. To further enhance the representation robustness, we design a tailored pre-training methodology that combines trajectory-level code masking with ontology-informed ancestor prediction, promoting semantic alignment across multiple modeling modules. Extensive experiments on multiple benchmark datasets demonstrate that DT-BEHRT achieves strong predictive performance and provides interpretable patient representations that align with clinicians' disease-centered reasoning. The source code is publicly accessible at https://github.com/GatorAIM/DT-BEHRT.git.
LGMay 28, 2019
Solving NP-Hard Problems on Graphs with Extended AlphaGo ZeroKenshin Abe, Zijian Xu, Issei Sato et al.
There have been increasing challenges to solve combinatorial optimization problems by machine learning. Khalil et al. proposed an end-to-end reinforcement learning framework, S2V-DQN, which automatically learns graph embeddings to construct solutions to a wide range of problems. To improve the generalization ability of their Q-learning method, we propose a novel learning strategy based on AlphaGo Zero which is a Go engine that achieved a superhuman level without the domain knowledge of the game. Our framework is redesigned for combinatorial problems, where the final reward might take any real number instead of a binary response, win/lose. In experiments conducted for five kinds of NP-hard problems including {\sc MinimumVertexCover} and {\sc MaxCut}, our method is shown to generalize better to various graphs than S2V-DQN. Furthermore, our method can be combined with recently-developed graph neural network (GNN) models such as the \emph{Graph Isomorphism Network}, resulting in even better performance. This experiment also gives an interesting insight into a suitable choice of GNN models for each task.