Xue Han

CL
h-index34
21papers
195citations
Novelty51%
AI Score58

21 Papers

84.0CLMay 27Code
ChildEval: When large language models meet children's personalities

Yanyan Luo, Xue Han, Chunxu Zhao et al.

While LLMs enable personalized chatbots, their effectiveness in child-centered personalization remains unclear, as systematic evaluation of child-specific preferences is still lacking. To address this gap, we introduce ChildEval, a benchmark for evaluating LLMs' ability to infer and follow child-centered preferences in long-context conversations. ChildEval contains 29K synthesized persona profiles of children aged 3-6, providing relatively static background information. Each persona is associated with a child preference-which may align with, conflict with, or be independent of the persona-expressed either explicitly in a single sentence or implicitly through 6-10 turn dialogues. Explicit and implicit preferences are designed to reflect the same underlying preference but differ in expression, capturing dynamic aspects of preference expression rather than changes in the static persona. The benchmark spans five top-level and fourteen sub-level categories covering children's daily lives and development. We further propose fine-grained, child-centric evaluation protocols to systematically assess open-source LLMs. Experimental results demonstrate how different personalized representations affect LLM responses and suggest that finetuning on ChildEval can enhance child-centered performance. Our code and dataset are available at https://github.com/ziyanluo/ChildEval.

CLAug 21, 2024Code
MoE-LPR: Multilingual Extension of Large Language Models through Mixture-of-Experts with Language Priors Routing

Hao Zhou, Zhijun Wang, Shujian Huang et al.

Large Language Models (LLMs) are often English-centric due to the disproportionate distribution of languages in their pre-training data. Enhancing non-English language capabilities through post-pretraining often results in catastrophic forgetting of the ability of original languages. Previous methods either achieve good expansion with severe forgetting or slight forgetting with poor expansion, indicating the challenge of balancing language expansion while preventing forgetting. In this paper, we propose a method called MoE-LPR (Mixture-of-Experts with Language Priors Routing) to alleviate this problem. MoE-LPR employs a two-stage training approach to enhance the multilingual capability. First, the model is post-pretrained into a Mixture-of-Experts (MoE) architecture by upcycling, where all the original parameters are frozen and new experts are added. In this stage, we focus improving the ability on expanded languages, without using any original language data. Then, the model reviews the knowledge of the original languages with replay data amounting to less than 1% of post-pretraining, where we incorporate language priors routing to better recover the abilities of the original languages. Evaluations on multiple benchmarks show that MoE-LPR outperforms other post-pretraining methods. Freezing original parameters preserves original language knowledge while adding new experts preserves the learning ability. Reviewing with LPR enables effective utilization of multilingual knowledge within the parameters. Additionally, the MoE architecture maintains the same inference overhead while increasing total model parameters. Extensive experiments demonstrate MoE-LPR's effectiveness in improving expanded languages and preserving original language proficiency with superior scalability. Code and scripts are freely available at https://github.com/zjwang21/MoE-LPR.git.

AIMay 9, 2022
AI Based Digital Twin Model for Cattle Caring

Xue Han, Zihuai Lin

In this paper, we developed innovative digital twins of cattle status that are powered by artificial intelligence (AI). The work was built on a farm IoT system that remotely monitors and tracks the state of cattle. A digital twin model of cattle health based on Deep Learning (DL) was generated using the sensor data acquired from the farm IoT system. The health and physiological cycle of cattle can be monitored in real time, and the state of the next physiological cycle of cattle can be anticipated using this model. The basis of this work is the vast amount of data which is required to validate the legitimacy of the digital twins model. In terms of behavioural state, it was found that the cattle treated with a combination of topical anaesthetic and meloxicam exhibits the least pain reaction. The digital twins model developed in this work can be used to monitor the health of cattle

SYJun 13, 2016
Method and Taxonomy for Evaluation of Distributed Control Strategies for Distributed Energy Resources

Xue Han, Kai Heussen, Oliver Gehrke et al.

Distributed control strategies applied to power distribution control problems are meant to offer robust and scalable integration of distributed energy resources. However, the term "distributed control" is often loosely applied to a variety of very different control strategies. This leads to problems in the design phase as well as with the performance evaluation of such systems. This paper develops a framework to assist the design and evaluation of distributed control strategies, supported by a rigorous taxonomy of control strategies and formal design criteria. The proposed framework and taxonomy are evaluated against the state of the art of control strategies as found in literature.

CLMar 9, 2023
ESCL: Equivariant Self-Contrastive Learning for Sentence Representations

Jie Liu, Yixuan Liu, Xue Han et al.

Previous contrastive learning methods for sentence representations often focus on insensitive transformations to produce positive pairs, but neglect the role of sensitive transformations that are harmful to semantic representations. Therefore, we propose an Equivariant Self-Contrastive Learning (ESCL) method to make full use of sensitive transformations, which encourages the learned representations to be sensitive to certain types of transformations with an additional equivariant learning task. Meanwhile, in order to improve practicability and generality, ESCL simplifies the implementations of traditional equivariant contrastive methods to share model parameters from the perspective of multi-task learning. We evaluate our ESCL on semantic textual similarity tasks. The proposed method achieves better results while using fewer learning parameters compared to previous methods.

LGAug 4, 2023
High-Accuracy Prediction of Metal-Insulator-Metal Metasurface with Deep Learning

Kaizhu Liu, Hsiang-Chen Chui, Changsen Sun et al.

Deep learning prediction of electromagnetic software calculation results has been a widely discussed issue in recent years. But the prediction accuracy was still one of the challenges to be solved. In this work, we proposed that the ResNets-10 model was used for predicting plasmonic metasurface S11 parameters. The two-stage training was performed by the k-fold cross-validation and small learning rate. After the training was completed, the prediction loss for aluminum, gold, and silver metal-insulator-metal metasurfaces was -48.45, -46.47, and -35.54, respectively. Due to the ultralow error value, the proposed network can replace the traditional electromagnetic computing method for calculation within a certain structural range. Besides, this network can finish the training process less than 1,100 epochs. This means that the network training process can effectively lower the design process time. The ResNets-10 model we proposed can also be used to design meta-diffractive devices and biosensors, thereby reducing the time required for the calculation process. The ultralow error of the network indicates that this work contributes to the development of future artificial intelligence electromagnetic computing software.

CLFeb 5
Self-Improving Multilingual Long Reasoning via Translation-Reasoning Integrated Training

Junxiao Liu, Zhijun Wang, Yixiao Li et al.

Long reasoning models often struggle in multilingual settings: they tend to reason in English for non-English questions; when constrained to reasoning in the question language, accuracies drop substantially. The struggle is caused by the limited abilities for both multilingual question understanding and multilingual reasoning. To address both problems, we propose TRIT (Translation-Reasoning Integrated Training), a self-improving framework that integrates the training of translation into multilingual reasoning. Without external feedback or additional multilingual data, our method jointly enhances multilingual question understanding and response generation. On MMATH, our method outperforms multiple baselines by an average of 7 percentage points, improving both answer correctness and language consistency. Further analysis reveals that integrating translation training improves cross-lingual question alignment by over 10 percentage points and enhances translation quality for both mathematical questions and general-domain text, with gains up to 8.4 COMET points on FLORES-200.

51.9CVMay 14
MambaRain: Multi-Scale Mamba-Attention Framework for 0-3 Hour Precipitation Nowcasting

Chunlei Shi, Cui Wu, Xiang Xu et al.

Accurate precipitation nowcasting over extended horizons (0-3 hours) is essential for disaster mitigation and operational decision-making, yet remains a critical challenge in the field. Existing deterministic approaches are predominantly constrained to shorter prediction windows (0-2 hours), exhibiting severe performance degradation beyond 90 minutes owing to their inherent difficulty in capturing long-range spatiotemporal dependencies from radar-derived observations. To address these fundamental limitations, we propose MambaRain, a novel multi-scale encoder-decoder architecture that synergistically integrates Mamba's linear-complexity long-range temporal modeling with self-attention mechanisms for explicit spatial correlation capture. The core innovation lies in a hybrid design paradigm wherein Mamba blocks leverage selective state space mechanisms to model global temporal dynamics across extended sequences with computational efficiency, while self-attention modules explicitly characterize spatial correlations within precipitation fields - a capability inherently absent in Mamba's sequential processing paradigm. This complementary synergy enables comprehensive spatiotemporal representation learning, effectively extending the viable forecasting horizon to 2-3 hours with substantial accuracy improvements. Furthermore, we introduce a spectral loss formulation to mitigate blurring artifacts characteristic of chaotic precipitation systems, thereby preserving fine-scale motion details critical for nowcasting accuracy. Experimental validation demonstrates that MambaRain substantially outperforms existing deterministic methodologies in 0-3 hour nowcasting tasks, with particularly pronounced performance gains in the challenging 2-3 hour prediction range.

CLFeb 3
PEGRL: Improving Machine Translation by Post-Editing Guided Reinforcement Learning

Yunzhi Shen, Hao Zhou, Xin Huang et al.

Reinforcement learning (RL) has shown strong promise for LLM-based machine translation, with recent methods such as GRPO demonstrating notable gains; nevertheless, translation-oriented RL remains challenged by noisy learning signals arising from Monte Carlo return estimation, as well as a large trajectory space that favors global exploration over fine-grained local optimization. We introduce \textbf{PEGRL}, a \textit{two-stage} RL framework that uses post-editing as an auxiliary task to stabilize training and guide overall optimization. At each iteration, translation outputs are sampled to construct post-editing inputs, allowing return estimation in the post-editing stage to benefit from conditioning on the current translation behavior, while jointly supporting both global exploration and fine-grained local optimization. A task-specific weighting scheme further balances the contributions of translation and post-editing objectives, yielding a biased yet more sample-efficient estimator. Experiments on English$\to$Finnish, English$\to$Turkish, and English$\leftrightarrow$Chinese show consistent gains over RL baselines, and for English$\to$Turkish, performance on COMET-KIWI is comparable to advanced LLM-based systems (DeepSeek-V3.2).

CLJun 24, 2024Code
Large Language Models Are Cross-Lingual Knowledge-Free Reasoners

Peng Hu, Sizhe Liu, Changjiang Gao et al.

Large Language Models have demonstrated impressive reasoning capabilities across multiple languages. However, the relationship between capabilities in different languages is less explored. In this work, we decompose the process of reasoning tasks into two separated components: knowledge retrieval and knowledge-free reasoning, and analyze the relationship between cross-lingual transferability and these two components. With adapted commonsense reasoning datasets and constructed knowledge-free reasoning datasets, we show that the knowledge-free reasoning capability can be nearly perfectly transferred across various source-target language directions despite the secondary impact of resource in some specific target languages, while cross-lingual knowledge retrieval significantly hinders the transfer. Moreover, by analyzing the hidden states and feed-forward network neuron activation during the reasoning, we show that higher similarity of hidden representations and larger overlap of activated neurons could explain the better cross-lingual transferability of knowledge-free reasoning than knowledge retrieval. Thus, we hypothesize that knowledge-free reasoning shares similar neurons in different languages for reasoning, while knowledge is stored separately in different languages. Our code and data is available at: https://github.com/NJUNLP/Knowledge-Free-Reasoning.

SEApr 10, 2018Code
ConPredictor: Concurrency Defect Prediction in Real-World Applications

Tingting Yu, Wei Wen, Xue Han et al.

Concurrent programs are difficult to test due to their inherent non-determinism. To address this problem, testing often requires the exploration of thread schedules of a program; this can be time-consuming when applied to real-world programs. Software defect prediction has been used to help developers find faults and prioritize their testing efforts. Prior studies have used machine learning to build such predicting models based on designed features that encode the characteristics of programs. However, research has focused on sequential programs; to date, no work has considered defect prediction for concurrent programs, with program characteristics distinguished from sequential programs. In this paper, we present ConPredictor, an approach to predict defects specific to concurrent programs by combining both static and dynamic program metrics. Specifically, we propose a set of novel static code metrics based on the unique properties of concurrent programs. We also leverage additional guidance from dynamic metrics constructed based on mutation analysis. Our evaluation on four large open source projects shows that ConPredictor improved both within-project defect prediction and cross-project defect prediction compared to traditional features.

CLMay 22, 2024
Getting More from Less: Large Language Models are Good Spontaneous Multilingual Learners

Shimao Zhang, Changjiang Gao, Wenhao Zhu et al.

Recently, Large Language Models (LLMs) have shown impressive language capabilities. While most of the existing LLMs have very unbalanced performance across different languages, multilingual alignment based on translation parallel data is an effective method to enhance the LLMs' multilingual capabilities. In this work, we discover and comprehensively investigate the spontaneous multilingual alignment improvement of LLMs. We find that LLMs instruction-tuned on the question translation data (i.e. without annotated answers) are able to encourage the alignment between English and a wide range of languages, even including those unseen during instruction-tuning. Additionally, we utilize different settings and mechanistic interpretability methods to analyze the LLM's performance in the multilingual scenario comprehensively. Our work suggests that LLMs have enormous potential for improving multilingual alignment efficiently with great language and task generalization.

CLApr 2, 2025
Investigating and Scaling up Code-Switching for Multilingual Language Model Pre-Training

Zhijun Wang, Jiahuan Li, Hao Zhou et al.

Large language models (LLMs) exhibit remarkable multilingual capabilities despite the extreme language imbalance in the pre-training data. In this paper, we closely examine the reasons behind this phenomenon, focusing on the pre-training corpus. We find that the existence of code-switching, alternating between different languages within a context, is key to multilingual capabilities. We conduct an analysis to investigate code-switching in the pre-training corpus, examining its presence and categorizing it into four types within two quadrants. We then assess its impact on multilingual performance. These types of code-switching data are unbalanced in proportions and demonstrate different effects on facilitating language transfer. To better explore the power of code-switching for language alignment during pre-training, we investigate the strategy of synthetic code-switching. We continuously scale up the synthetic code-switching data and observe remarkable improvements in both benchmarks and representation space. Extensive experiments indicate that incorporating synthetic code-switching data enables better language alignment and generalizes well to high, medium, and low-resource languages with pre-training corpora of varying qualities.

CLApr 15, 2025
Understanding LLMs' Cross-Lingual Context Retrieval: How Good It Is And Where It Comes From

Changjiang Gao, Hankun Lin, Xin Huang et al.

Cross-lingual context retrieval (extracting contextual information in one language based on requests in another) is a fundamental aspect of cross-lingual alignment, but the performance and mechanism of it for large language models (LLMs) remains unclear. In this paper, we evaluate the cross-lingual context retrieval of over 40 LLMs across 12 languages, using cross-lingual machine reading comprehension (xMRC) as a representative scenario. Our results show that post-trained open LLMs show strong cross-lingual context retrieval ability, comparable to closed-source LLMs such as GPT-4o, and their estimated oracle performances greatly improve after post-training. Our mechanism analysis shows that the cross-lingual context retrieval process can be divided into two main phases: question encoding and answer retrieval, which are formed in pre-training and post-training respectively. The phasing stability correlates with xMRC performance, and the xMRC bottleneck lies at the last model layers in the second phase, where the effect of post-training can be evidently observed. Our results also indicate that larger-scale pretraining cannot improve the xMRC performance. Instead, larger LLMs need further multilingual post-training to fully unlock their cross-lingual context retrieval potential.

CLMar 6, 2025
Temporal Alignment of LLMs through Cycle Encoding for Long-Range Time Representations

Xue Han, Qian Hu, Yitong Wang et al.

Large language models (LLMs) suffer from temporal misalignment issues especially across long span of time. The issue arises from knowing that LLMs are trained on large amounts of data where temporal information is rather sparse over long times, such as thousands of years, resulting in insufficient learning or catastrophic forgetting by the LLMs. This paper proposes a methodology named "Ticktack" for addressing the LLM's long-time span misalignment in a yearly setting. Specifically, we first propose to utilize the sexagenary year expression instead of the Gregorian year expression employed by LLMs, achieving a more uniform distribution in yearly granularity. Then, we employ polar coordinates to model the sexagenary cycle of 60 terms and the year order within each term, with additional temporal encoding to ensure LLMs understand them. Finally, we present a temporal representational alignment approach for post-training LLMs that effectively distinguishes time points with relevant knowledge, hence improving performance on time-related tasks, particularly over a long period. We also create a long time span benchmark for evaluation. Experimental results prove the effectiveness of our proposal.

COMP-PHFeb 23, 2025
A new framework for X-ray absorption spectroscopy data analysis based on machine learning: XASDAML

Xue Han, Haodong Yao, Fei Zhan et al.

X-ray absorption spectroscopy (XAS) is a powerful technique to probe the electronic and structural properties of materials. With the rapid growth in both the volume and complexity of XAS datasets driven by advancements in synchrotron radiation facilities, there is an increasing demand for advanced computational tools capable of efficiently analyzing large-scale data. To address these needs, we introduce XASDAML,a flexible, machine learning based framework that integrates the entire data-processing workflow-including dataset construction for spectra and structural descriptors, data filtering, ML modeling, prediction, and model evaluation-into a unified platform. Additionally, it supports comprehensive statistical analysis, leveraging methods such as principal component analysis and clustering to reveal potential patterns and relationships within large datasets. Each module operates independently, allowing users to modify or upgrade modules in response to evolving research needs or technological advances. Moreover, the platform provides a user-friendly interface via Jupyter Notebook, making it accessible to researchers at varying levels of expertise. The versatility and effectiveness of XASDAML are exemplified by its application to a copper dataset, where it efficiently manages large and complex data, supports both supervised and unsupervised machine learning models, provides comprehensive statistics for structural descriptors, generates spectral plots, and accurately predicts coordination numbers and bond lengths. Furthermore, the platform streamlining the integration of XAS with machine learning and lowering the barriers to entry for new users.

CLJan 25
Align to the Pivot: Dual Alignment with Self-Feedback for Multilingual Math Reasoning

Chunxu Zhao, Xin Huang, Xue Han et al.

Despite the impressive reasoning abilities demonstrated by large language models (LLMs), empirical evidence indicates that they are not language agnostic as expected, leading to performance declines in multilingual settings, especially for low-resource languages. We attribute the decline to the model's inconsistent multilingual understanding and reasoning alignment. To address this, we present Pivot-Aligned Self-Feedback Multilingual Reasoning (PASMR), aiming to improve the alignment of multilingual math reasoning abilities in LLMs. This approach designates the model's primary language as the pivot language. During training, the model first translates questions into the pivot language to facilitate better alignment of reasoning patterns. The reasoning process in the target language is then supervised by the pivot language's reasoning answers, thereby establishing a cross-lingual self-feedback mechanism without relying on external correct answers or reward models. Extensive experimental results demonstrate that our method enhances both the model's understanding of questions and its reasoning capabilities, leading to notable task improvements.

CLAug 22, 2025
MultiPL-MoE: Multi-Programming-Lingual Extension of Large Language Models through Hybrid Mixture-of-Experts

Qing Wang, Xue Han, Jiahui Wang et al.

Despite LLMs' excellent code creation capabilities, multilingual code generation remains extremely challenging. To address this, we intent to improve the multi-programming-lingual (MultiPL) performance of the base LLMs while retaining the most popular ones using restricted computational resources. We consider MultiPL to be a special case of multiple natural languages and propose a MultiPL extension of LLMs utilizing a hybrid mixture of experts (MoE), called MultiPL-MoE. Specifically, MultiPL-MoE combines two paired MoEs to optimize expert selection at both the token and segment levels. The token-level MoE is a standard upcycling MoE structure with a shared expert and a novel gate weight normalization approach that aids in the final fusion with the segment-level MoE. The segment-level MoE incorporates two innovative designs to better capture the syntactic structure and contextual patterns of programming languages: First, using a sliding window to partition the input token sequence into multiple segments; Then, adopting an expert-choice routing strategy that allows experts to select the top-k segments. The results of the experiment proved the effectiveness of MultiPL-MoE.

IVMar 25, 2024
Real-time Neuron Segmentation for Voltage Imaging

Yosuke Bando, Ramdas Pillai, Atsushi Kajita et al.

In voltage imaging, where the membrane potentials of individual neurons are recorded at from hundreds to thousand frames per second using fluorescence microscopy, data processing presents a challenge. Even a fraction of a minute of recording with a limited image size yields gigabytes of video data consisting of tens of thousands of frames, which can be time-consuming to process. Moreover, millisecond-level short exposures lead to noisy video frames, obscuring neuron footprints especially in deep-brain samples where noisy signals are buried in background fluorescence. To address this challenge, we propose a fast neuron segmentation method able to detect multiple, potentially overlapping, spiking neurons from noisy video frames, and implement a data processing pipeline incorporating the proposed segmentation method along with GPU-accelerated motion correction. By testing on existing datasets as well as on new datasets we introduce, we show that our pipeline extracts neuron footprints that agree well with human annotation even from cluttered datasets, and demonstrate real-time processing of voltage imaging data on a single desktop computer for the first time.

SEOct 3, 2020
Automated Performance Tuning for Highly-Configurable Software Systems

Xue Han, Tingting Yu

Performance is an important non-functional aspect of the software requirement. Modern software systems are highly-configurable and misconfigurations may easily cause performance issues. A software system that suffers performance issues may exhibit low program throughput and long response time. However, the sheer size of the configuration space makes it challenging for administrators to manually select and adjust the configuration options to achieve better performance. In this paper, we propose ConfRL, an approach to tune software performance automatically. The key idea of ConfRL is to use reinforcement learning to explore the configuration space by a trial-and-error approach and to use the feedback received from the environment to tune configuration option values to achieve better performance. To reduce the cost of reinforcement learning, ConfRL employs sampling, clustering, and dynamic state reduction techniques to keep states in a large configuration space manageable. Our evaluation of four real-world highly-configurable server programs shows that ConfRL can efficiently and effectively guide software systems to achieve higher long-term performance.

CLJan 5, 2020
Automatic Business Process Structure Discovery using Ordered Neurons LSTM: A Preliminary Study

Xue Han, Lianxue Hu, Yabin Dang et al.

Automatic process discovery from textual process documentations is highly desirable to reduce time and cost of Business Process Management (BPM) implementation in organizations. However, existing automatic process discovery approaches mainly focus on identifying activities out of the documentations. Deriving the structural relationships between activities, which is important in the whole process discovery scope, is still a challenge. In fact, a business process has latent semantic hierarchical structure which defines different levels of detail to reflect the complex business logic. Recent findings in neural machine learning area show that the meaningful linguistic structure can be induced by joint language modeling and structure learning. Inspired by these findings, we propose to retrieve the latent hierarchical structure present in the textual business process documents by building a neural network that leverages a novel recurrent architecture, Ordered Neurons LSTM (ON-LSTM), with process-level language model objective. We tested the proposed approach on data set of Process Description Documents (PDD) from our practical Robotic Process Automation (RPA) projects. Preliminary experiments showed promising results.