Xujun Li

AI
h-index21
4papers
16citations
Novelty55%
AI Score52

4 Papers

97.1AIMay 27
You Live More Than Once: Towards Hierarchical Skill Meta-Evolving

Xujun Li, Kehan Zheng, Mingyuan Zhao et al.

Test-time skill evolving is regarded as a new paradigm for enhancing deployed agentic systems. Existing works mainly focus on hard-coded skill evolving strategies or parametric learning that rely on expensive parameter updates in the underlying LLMs. In this paper, we demonstrate that test-time refinement of the skill evolving framework itself is necessary for continuous improvement of the agent systems in different downstream scenarios, and lightweight algorithmic adaptation is feasible. Specifically, we propose HiSME, a lightweight hierarchical skill meta-evolving solution that jointly optimizes skills and the skill evolving strategy by learning meta-skills from agents' task execution traces. Experiments on diverse agentic benchmarks show that meta-evolving can produce a higher-quality skill library than pure skill evolving and can derive diverse meta-skills for different scenarios, thereby facilitating future continual experience learning. Our code is temporarily public at https://anonymous.4open.science/r/HiSME-BD45.

IVSep 24, 2024Code
ManiNeg: Manifestation-guided Multimodal Pretraining for Mammography Classification

Xujun Li, Xin Wei, Jing Jiang et al.

Breast cancer is a significant threat to human health. Contrastive learning has emerged as an effective method to extract critical lesion features from mammograms, thereby offering a potent tool for breast cancer screening and analysis. A crucial aspect of contrastive learning involves negative sampling, where the selection of appropriate hard negative samples is essential for driving representations to retain detailed information about lesions. In contrastive learning, it is often assumed that features can sufficiently capture semantic content, and that each minibatch inherently includes ideal hard negative samples. However, the characteristics of breast lumps challenge these assumptions. In response, we introduce ManiNeg, a novel approach that leverages manifestations as proxies to mine hard negative samples. Manifestations, which refer to the observable symptoms or signs of a disease, provide a knowledge-driven and robust basis for choosing hard negative samples. This approach benefits from its invariance to model optimization, facilitating efficient sampling. To support ManiNeg and future research endeavors, we developed the MVKL dataset, which includes multi-view mammograms, corresponding reports, meticulously annotated manifestations, and pathologically confirmed benign-malignant outcomes. We evaluate ManiNeg on the benign and malignant classification task. Our results demonstrate that ManiNeg not only improves representation in both unimodal and multimodal contexts but also shows generalization across datasets. The MVKL dataset and our codes are publicly available at https://github.com/wxwxwwxxx/ManiNeg.

LGOct 30, 2025
Data-Efficient RLVR via Off-Policy Influence Guidance

Erle Zhu, Dazhi Jiang, Yuan Wang et al.

Data selection is a critical aspect of Reinforcement Learning with Verifiable Rewards (RLVR) for enhancing the reasoning capabilities of large language models (LLMs). Current data selection methods are largely heuristic-based, lacking theoretical guarantees and generalizability. This work proposes a theoretically-grounded approach using influence functions to estimate the contribution of each data point to the learning objective. To overcome the prohibitive computational cost of policy rollouts required for online influence estimation, we introduce an off-policy influence estimation method that efficiently approximates data influence using pre-collected offline trajectories. Furthermore, to manage the high-dimensional gradients of LLMs, we employ sparse random projection to reduce dimensionality and improve storage and computation efficiency. Leveraging these techniques, we develop \textbf{C}urriculum \textbf{R}L with \textbf{O}ff-\textbf{P}olicy \text{I}nfluence guidance (\textbf{CROPI}), a multi-stage RL framework that iteratively selects the most influential data for the current policy. Experiments on models up to 7B parameters demonstrate that CROPI significantly accelerates training. On a 1.5B model, it achieves a 2.66x step-level acceleration while using only 10\% of the data per stage compared to full-dataset training. Our results highlight the substantial potential of influence-based data selection for efficient RLVR.

AIJan 18, 2025Code
MAPS: Advancing Multi-Modal Reasoning in Expert-Level Physical Science

Erle Zhu, Yadi Liu, Zhe Zhang et al.

Pre-trained on extensive text and image corpora, current Multi-Modal Large Language Models (MLLM) have shown strong capabilities in general visual reasoning tasks. However, their performance is still lacking in physical domains that require understanding diagrams with complex physical structures and quantitative analysis based on multi-modal information. To address this, we develop a new framework, named Multi-Modal Scientific Reasoning with Physics Perception and Simulation (MAPS) based on an MLLM. MAPS decomposes expert-level multi-modal reasoning task into physical diagram understanding via a Physical Perception Model (PPM) and reasoning with physical knowledge via a simulator. The PPM module is obtained by fine-tuning a visual language model using carefully designed synthetic data with paired physical diagrams and corresponding simulation language descriptions. At the inference stage, MAPS integrates the simulation language description of the input diagram provided by PPM and results obtained through a Chain-of-Simulation process with MLLM to derive the underlying rationale and the final answer. Validated using our collected college-level circuit analysis problems, MAPS significantly improves reasoning accuracy of MLLM and outperforms all existing models. The results confirm MAPS offers a promising direction for enhancing multi-modal scientific reasoning ability of MLLMs. We will release our code, model and dataset used for our experiments upon publishing of this paper.