Xiaoxu Ma

CV
h-index5
7papers
8citations
Novelty53%
AI Score56

7 Papers

LGNov 30, 2025Code
World Model Robustness via Surprise Recognition

Geigh Zollicoffer, Tanush Chopra, Mingkuan Yan et al.

AI systems deployed in the real world must contend with distractions and out-of-distribution (OOD) noise that can destabilize their policies and lead to unsafe behavior. While robust training can reduce sensitivity to some forms of noise, it is infeasible to anticipate all possible OOD conditions. To mitigate this issue, we develop an algorithm that leverages a world model's inherent measure of surprise to reduce the impact of noise in world model--based reinforcement learning agents. We introduce both multi-representation and single-representation rejection sampling, enabling robustness to settings with multiple faulty sensors or a single faulty sensor. While the introduction of noise typically degrades agent performance, we show that our techniques preserve performance relative to baselines under varying types and levels of noise across multiple environments within self-driving simulation domains (CARLA and Safety Gymnasium). Furthermore, we demonstrate that our methods enhance the stability of two state-of-the-art world models with markedly different underlying architectures: Cosmos and DreamerV3. Together, these results highlight the robustness of our approach across world modeling domains. We release our code at https://github.com/Bluefin-Tuna/WISER .

LGOct 21, 2024Code
Mesa-Extrapolation: A Weave Position Encoding Method for Enhanced Extrapolation in LLMs

Xin Ma, Yang Liu, Jingjing Liu et al.

Large language models (LLMs), although having revolutionized many fields, still suffer from the challenging extrapolation problem, where the inference ability of LLMs sharply declines beyond their max training lengths. In this work, we conduct a theoretical analysis to better understand why No Position Encoding (NoPE) fails outside its effective range, as well as examining the power of Position Encoding (PE) in this context. Our findings reveal that with meticulous weave position, PE can indeed be extended beyond effective range. Our theorems establish that LLMs equipped with weave PE can achieve improved extrapolation performance without additional cost. Furthermore, we introduce a novel weave PE method, Mesa-Extrapolation, which utilizes a chunk-based triangular attention matrix and applies Stair PE to manage the final chunk. This method not only retains competitive performance but also offers substantial benefits such as significantly reduced memory demand and faster inference speed. Extensive experiments validate the effectiveness of Mesa-Extrapolation, demonstrating its potential as a scalable solution to enhancing LLMs applicative reach. Our code is available at \url{https://github.com/soacker/Mesa-Extrapolation}.

49.6AIMar 19
LLM-Enhanced Energy Contrastive Learning for Out-of-Distribution Detection in Text-Attributed Graphs

Xiaoxu Ma, Dong Li, Minglai Shao et al.

Text-attributed graphs, where nodes are enriched with textual attributes, have become a powerful tool for modeling real-world networks such as citation, social, and transaction networks. However, existing methods for learning from these graphs often assume that the distributions of training and testing data are consistent. This assumption leads to significant performance degradation when faced with out-of-distribution (OOD) data. In this paper, we address the challenge of node-level OOD detection in text-attributed graphs, with the goal of maintaining accurate node classification while simultaneously identifying OOD nodes. We propose a novel approach, LLM-Enhanced Energy Contrastive Learning for Out-of-Distribution Detection in Text-Attributed Graphs (LECT), which integrates large language models (LLMs) and energy-based contrastive learning. The proposed method involves generating high-quality OOD samples by leveraging the semantic understanding and contextual knowledge of LLMs to create dependency-aware pseudo-OOD nodes, and applying contrastive learning based on energy functions to distinguish between in-distribution (IND) and OOD nodes. The effectiveness of our method is demonstrated through extensive experiments on six benchmark datasets, where our method consistently outperforms state-of-the-art baselines, achieving both high classification accuracy and robust OOD detection capabilities.

CLJan 14
Stable and Explainable Personality Trait Evaluation in Large Language Models with Internal Activations

Xiaoxu Ma, Xiangbo Zhang, Zhenyu Weng

Evaluating personality traits in Large Language Models (LLMs) is key to model interpretation, comparison, and responsible deployment. However, existing questionnaire-based evaluation methods exhibit limited stability and offer little explainability, as their results are highly sensitive to minor variations in prompt phrasing or role-play configurations. To address these limitations, we propose an internal-activation-based approach, termed Persona-Vector Neutrality Interpolation (PVNI), for stable and explainable personality trait evaluation in LLMs. PVNI extracts a persona vector associated with a target personality trait from the model's internal activations using contrastive prompts. It then estimates the corresponding neutral score by interpolating along the persona vector as an anchor axis, enabling an interpretable comparison between the neutral prompt representation and the persona direction. We provide a theoretical analysis of the effectiveness and generalization properties of PVNI. Extensive experiments across diverse LLMs demonstrate that PVNI yields substantially more stable personality trait evaluations than existing methods, even under questionnaire and role-play variants.

CVJan 14
UniHash: Unifying Pointwise and Pairwise Hashing Paradigms for Seen and Unseen Category Retrieval

Xiaoxu Ma, Runhao Li, Hanwen Liu et al.

Effective retrieval across both seen and unseen categories is crucial for modern image retrieval systems. Retrieval on seen categories ensures precise recognition of known classes, while retrieval on unseen categories promotes generalization to novel classes with limited supervision. However, most existing deep hashing methods are confined to a single training paradigm, either pointwise or pairwise, where the former excels on seen categories and the latter generalizes better to unseen ones. To overcome this limitation, we propose Unified Hashing (UniHash), a dual-branch framework that unifies the strengths of both paradigms to achieve balanced retrieval performance across seen and unseen categories. UniHash consists of two complementary branches: a center-based branch following the pointwise paradigm and a pairwise branch following the pairwise paradigm. A novel hash code learning method is introduced to enable bidirectional knowledge transfer between branches, improving hash code discriminability and generalization. It employs a mutual learning loss to align hash representations and introduces a Split-Merge Mixture of Hash Experts (SM-MoH) module to enhance cross-branch exchange of hash representations. Theoretical analysis substantiates the effectiveness of UniHash, and extensive experiments on CIFAR-10, MSCOCO, and ImageNet demonstrate that UniHash consistently achieves state-of-the-art performance in both seen and unseen image retrieval scenarios.

CVOct 9, 2025
Mutual Learning for Hashing: Unlocking Strong Hash Functions from Weak Supervision

Xiaoxu Ma, Runhao Li, Zhenyu Weng

Deep hashing has been widely adopted for large-scale image retrieval, with numerous strategies proposed to optimize hash function learning. Pairwise-based methods are effective in learning hash functions that preserve local similarity relationships, whereas center-based methods typically achieve superior performance by more effectively capturing global data distributions. However, the strength of center-based methods in modeling global structures often comes at the expense of underutilizing important local similarity information. To address this limitation, we propose Mutual Learning for Hashing (MLH), a novel weak-to-strong framework that enhances a center-based hashing branch by transferring knowledge from a weaker pairwise-based branch. MLH consists of two branches: a strong center-based branch and a weaker pairwise-based branch. Through an iterative mutual learning process, the center-based branch leverages local similarity cues learned by the pairwise-based branch. Furthermore, inspired by the mixture-of-experts paradigm, we introduce a novel mixture-of-hash-experts module that enables effective cross-branch interaction, further enhancing the performance of both branches. Extensive experiments demonstrate that MLH consistently outperforms state-of-the-art hashing methods across multiple benchmark datasets.

SIDec 29, 2024
Hypergraph-Based Dynamic Graph Node Classification

Xiaoxu Ma, Chen Zhao, Minglai Shao et al.

Node classification on static graphs has achieved significant success, but achieving accurate node classification on dynamic graphs where node topology, attributes, and labels change over time has not been well addressed. Existing methods based on RNNs and self-attention only aggregate features of the same node across different time slices, which cannot adequately address and capture the diverse dynamic changes in dynamic graphs. Therefore, we propose a novel model named Hypergraph-Based Multi-granularity Dynamic Graph Node Classification (HYDG). After obtaining basic node representations for each slice through a GNN backbone, HYDG models the representations of each node in the dynamic graph through two modules. The individual-level hypergraph captures the spatio-temporal node representations between individual nodes, while the group-level hypergraph captures the multi-granularity group temporal representations among nodes of the same class. Each hyperedge captures different temporal dependencies of varying lengths by connecting multiple nodes within specific time ranges. More accurate representations are obtained through weighted information propagation and aggregation by the hypergraph neural network. Extensive experiments on five real dynamic graph datasets using two GNN backbones demonstrate the superiority of our proposed framework.