Yi Mei

h-index42

27papers

98citations

Novelty54%

AI Score55

Ranked #24,704 of 201,326 authors (top 12%)#979 in AI (top 7%)

27 Papers

CLMay 25

AutoSG: LLM-Driven Solver Generation Solely from Task Prompts for Expensive Optimization

Haoran Gu, Handing Wang, Yi Mei et al.

Expensive optimization tasks are ubiquitous in real-world applications, demanding highly specialized solvers. While LLM-driven automated solver generation shows promise, current paradigms face three critical issues when tackling expensive optimization: factual hallucinations due to deficient domain knowledge, the frequent dismantling of previously established locally optimal structures during refinement, and the prohibitive evaluation costs alongside restricted generalization caused by executing on training instances. To address these issues, we introduce AutoSG, a fully automated workflow directly translating natural language prompts into executable customized solvers. AutoSG features three core innovations: a retrieval-augmented solver generation module strictly grounding code in verified literature; a one-step self-refinement operator introducing task-specific improvements while preserving critical structural components; and an instance-free Elo-based LLM-as-a-Judge evaluation mechanism rapidly establishing global rankings. Extensive evaluations across diverse expensive optimization tasks confirm AutoSG significantly outperforms human-designed state-of-the-art frameworks and existing LLM-generated solvers.

CRJan 1

Overlooked Safety Vulnerability in LLMs: Malicious Intelligent Optimization Algorithm Request and its Jailbreak

Haoran Gu, Handing Wang, Yi Mei et al.

The widespread deployment of large language models (LLMs) has raised growing concerns about their misuse risks and associated safety issues. While prior studies have examined the safety of LLMs in general usage, code generation, and agent-based applications, their vulnerabilities in automated algorithm design remain underexplored. To fill this gap, this study investigates this overlooked safety vulnerability, with a particular focus on intelligent optimization algorithm design, given its prevalent use in complex decision-making scenarios. We introduce MalOptBench, a benchmark consisting of 60 malicious optimization algorithm requests, and propose MOBjailbreak, a jailbreak method tailored for this scenario. Through extensive evaluation of 13 mainstream LLMs including the latest GPT-5 and DeepSeek-V3.1, we reveal that most models remain highly susceptible to such attacks, with an average attack success rate of 83.59% and an average harmfulness score of 4.28 out of 5 on original harmful prompts, and near-complete failure under MOBjailbreak. Furthermore, we assess state-of-the-art plug-and-play defenses that can be applied to closed-source models, and find that they are only marginally effective against MOBjailbreak and prone to exaggerated safety behaviors. These findings highlight the urgent need for stronger alignment techniques to safeguard LLMs against misuse in algorithm design.

LGApr 22

Machine Learning for Two-Stage Graph Sparsification for the Travelling Salesman Problem

Bo-Cheng Lin, Yi Mei, Mengjie Zhang

High-performance TSP solvers like LKH search within a sparsified candidate graph rather than over all possible edges. Graph sparsification is non-trivial: keep too many edges and the solver wastes time; cut too many and it loses edges that belong to the optimal tour. The two leading heuristic methods, $α$-Nearest and POPMUSIC, produce high-quality candidate graphs, but no single heuristic is both sparse and reliable across all instance sizes and distributions. Machine learning methods can potentially learn better sparsification models. However, existing approaches operate on the complete graph, which is expensive and mostly restricted to Euclidean distances. To address this issue, we propose a two-stage graph sparsification approach: Stage~1 takes the union of $α$-Nearest and POPMUSIC to maximise recall; Stage~2 trains a single model to reduce density. We conducted experiments across four TSPLIB distance types, five spatial distributions, and problem sizes from 50 to 500. The two-stage approach substantially reduces candidate-graph density while retaining high coverage, generalises across distance types and distributions, outperforms recent neural sparsification methods that are restricted to Euclidean distances, and becomes increasingly valuable at larger scales where single-stage heuristics degrade.

AIJan 22

Investigation of the Generalisation Ability of Genetic Programming-evolved Scheduling Rules in Dynamic Flexible Job Shop Scheduling

Luyao Zhu, Fangfang Zhang, Yi Mei et al.

Dynamic Flexible Job Shop Scheduling (DFJSS) is a complex combinatorial optimisation problem that requires simultaneous machine assignment and operation sequencing decisions in dynamic production environments. Genetic Programming (GP) has been widely applied to automatically evolve scheduling rules for DFJSS. However, existing studies typically train and test GP-evolved rules on DFJSS instances of the same type, which differ only by random seeds rather than by structural characteristics, leaving their cross-type generalisation ability largely unexplored. To address this gap, this paper systematically investigates the generalisation ability of GP-evolved scheduling rules under diverse DFJSS conditions. A series of experiments are conducted across multiple dimensions, including problem scale (i.e., the number of machines and jobs), key job shop parameters (e.g., utilisation level), and data distributions, to analyse how these factors influence GP performance on unseen instance types. The results show that good generalisation occurs when the training instances contain more jobs than the test instances while keeping the number of machines fixed, and when both training and test instances have similar scales or job shop parameters. Further analysis reveals that the number and distribution of decision points in DFJSS instances play a crucial role in explaining these performance differences. Similar decision point distributions lead to better generalisation, whereas significant discrepancies result in a marked degradation of performance. Overall, this study provides new insights into the generalisation ability of GP in DFJSS and highlights the necessity of evolving more generalisable GP rules capable of handling heterogeneous DFJSS instances effectively.

AIMar 26, 2024Code

Learning Traffic Signal Control via Genetic Programming

Xiao-Cheng Liao, Yi Mei, Mengjie Zhang

The control of traffic signals is crucial for improving transportation efficiency. Recently, learning-based methods, especially Deep Reinforcement Learning (DRL), garnered substantial success in the quest for more efficient traffic signal control strategies. However, the design of rewards in DRL highly demands domain knowledge to converge to an effective policy, and the final policy also presents difficulties in terms of explainability. In this work, a new learning-based method for signal control in complex intersections is proposed. In our approach, we design a concept of phase urgency for each signal phase. During signal transitions, the traffic light control strategy selects the next phase to be activated based on the phase urgency. We then proposed to represent the urgency function as an explainable tree structure. The urgency function can calculate the phase urgency for a specific phase based on the current road conditions. Genetic programming is adopted to perform gradient-free optimization of the urgency function. We test our algorithm on multiple public traffic signal control datasets. The experimental results indicate that the tree-shaped urgency function evolved by genetic programming outperforms the baselines, including a state-of-the-art method in the transportation field and a well-known DRL-based method. Our code is available online.

LGJan 30

Keep Rehearsing and Refining: Lifelong Learning Vehicle Routing under Continually Drifting Tasks

Jiyuan Pei, Yi Mei, Jialin Liu et al.

Existing neural solvers for vehicle routing problems (VRPs) are typically trained either in a one-off manner on a fixed set of pre-defined tasks or in a lifelong manner on several tasks arriving sequentially, assuming sufficient training on each task. Both settings overlook a common real-world property: problem patterns may drift continually over time, yielding massive tasks sequentially arising while offering only limited training resources per task. In this paper, we study a novel lifelong learning paradigm for neural VRP solvers under continually drifting tasks over learning time steps, where sufficient training for any given task at any time is not available. We propose Dual Replay with Experience Enhancement (DREE), a general framework to improve learning efficiency and mitigate catastrophic forgetting under such drift. Extensive experiments show that, under such continual drift, DREE effectively learns new tasks, preserves prior knowledge, improves generalization to unseen tasks, and can be applied to diverse existing neural solvers.

AINov 11, 2025

GAMA: A Neural Neighborhood Search Method with Graph-aware Multi-modal Attention for Vehicle Routing Problem

Xiangling Chen, Yi Mei, Mengjie Zhang

Recent advances in neural neighborhood search methods have shown potential in tackling Vehicle Routing Problems (VRPs). However, most existing approaches rely on simplistic state representations and fuse heterogeneous information via naive concatenation, limiting their ability to capture rich structural and semantic context. To address these limitations, we propose GAMA, a neural neighborhood search method with Graph-aware Multi-modal Attention model in VRP. GAMA encodes the problem instance and its evolving solution as distinct modalities using graph neural networks, and models their intra- and inter-modal interactions through stacked self- and cross-attention layers. A gated fusion mechanism further integrates the multi-modal representations into a structured state, enabling the policy to make informed and generalizable operator selection decisions. Extensive experiments conducted across various synthetic and benchmark instances demonstrate that the proposed algorithm GAMA significantly outperforms the recent neural baselines. Further ablation studies confirm that both the multi-modal attention mechanism and the gated fusion design play a key role in achieving the observed performance gains.

LGNov 8, 2025

SymLight: Exploring Interpretable and Deployable Symbolic Policies for Traffic Signal Control

Xiao-Cheng Liao, Yi Mei, Mengjie Zhang

Deep Reinforcement Learning have achieved significant success in automatically devising effective traffic signal control (TSC) policies. Neural policies, however, tend to be over-parameterized and non-transparent, hindering their interpretability and deployability on resource-limited edge devices. This work presents SymLight, a priority function search framework based on Monte Carlo Tree Search (MCTS) for discovering inherently interpretable and deployable symbolic priority functions to serve as the TSC policies. The priority function, in particular, accepts traffic features as input and then outputs a priority for each traffic signal phase, which subsequently directs the phase transition. For effective search, we propose a concise yet expressive priority function representation. This helps mitigate the combinatorial explosion of the action space in MCTS. Additionally, a probabilistic structural rollout strategy is introduced to leverage structural patterns from previously discovered high-quality priority functions, guiding the rollout process. Our experiments on real-world datasets demonstrate SymLight's superior performance across a range of baselines. A key advantage is SymLight's ability to produce interpretable and deployable TSC policies while maintaining excellent performance.

NEMar 17

Surrogate-Assisted Genetic Programming with Rank-Based Phenotypic Characterisation for Dynamic Multi-Mode Project Scheduling

Yuan Tian, Yi Mei, Mengjie Zhang

The dynamic multi-mode resource-constrained project scheduling problem (DMRCPSP) is of practical importance, as it requires making real-time decisions under changing project states and resource availability. Genetic Programming (GP) has been shown to effectively evolve heuristic rules for such decision-making tasks; however, the evolutionary process typically relies on a large number of simulation-based fitness evaluations, resulting in high computational cost. Surrogate models offer a promising solution to reduce evaluation cost, but their application to GP requires problem-specific phenotypic characterisation (PC) schemes of heuristic rules. There is currently a lack of suitable PC schemes for GP applied to DMRCPSP. This paper proposes a rank-based PC scheme derived from heuristic-driven ordering of eligible activity-mode pairs and activity groups in decision situations. The resulting PC vectors enable a surrogate model to estimate the fitness of unevaluated GP individuals. Based on this scheme, a surrogate-assisted GP algorithm is developed. Experimental results demonstrate that the proposed surrogate-assisted GP can identify high-quality heuristic rules consistently earlier than the state-of-the-art GP approach for DMRCPSP, while introducing only marginal computational overhead. Further analyses demonstrate that the surrogate model provides useful guidance for offspring selection, leading to improved evolutionary efficiency.

LGMar 26, 2025Code

Generalized Phase Pressure Control Enhanced Reinforcement Learning for Traffic Signal Control

Xiao-Cheng Liao, Yi Mei, Mengjie Zhang et al.

Appropriate traffic state representation is crucial for learning traffic signal control policies. However, most of the current traffic state representations are heuristically designed, with insufficient theoretical support. In this paper, we (1) develop a flexible, efficient, and theoretically grounded method, namely generalized phase pressure (G2P) control, which takes only simple lane features into consideration to decide which phase to be actuated; 2) extend the pressure control theory to a general form for multi-homogeneous-lane road networks based on queueing theory; (3) design a new traffic state representation based on the generalized phase state features from G2P control; and 4) develop a reinforcement learning (RL)-based algorithm template named G2P-XLight, and two RL algorithms, G2P-MPLight and G2P-CoLight, by combining the generalized phase state representation with MPLight and CoLight, two well-performed RL methods for learning traffic signal control policies. Extensive experiments conducted on multiple real-world datasets demonstrate that G2P control outperforms the state-of-the-art (SOTA) heuristic method in the transportation field and other recent human-designed heuristic methods; and that the newly proposed G2P-XLight significantly outperforms SOTA learning-based approaches. Our code is available online.

SEMar 19

From Human Interfaces to Agent Interfaces: Rethinking Software Design in the Age of AI-Native Systems

Shaolin Wang, Yi Mei, Haoyang Che et al.

Software systems have traditionally been designed for human interaction, emphasizing graphical user interfaces, usability, and cognitive alignment with end users. However, recent advances in large language model (LLM)-based agents are changing the primary consumers of software systems. Increasingly, software is no longer only used by humans, but also invoked autonomously by AI agents through structured interfaces. In this paper, we argue that software engineering is undergoing a paradigm shift from human-oriented interfaces to agent-oriented invocation systems. We formalize the notion of agent interfaces, introduce invocable capabilities as the fundamental building blocks of AI-oriented software, and outline design principles for such systems, including machine interpretability, composability, and invocation reliability. We then discuss architectural and organizational implications of this shift, highlighting a transition from monolithic applications to capability-based systems that can be dynamically composed by AI agents. The paper aims to provide a conceptual foundation for the emerging paradigm of AI-native software design.

AIJan 13, 2024

Distance-aware Attention Reshaping: Enhance Generalization of Neural Solver for Large-scale Vehicle Routing Problems

Yang Wang, Ya-Hui Jia, Wei-Neng Chen et al.

Neural solvers based on attention mechanism have demonstrated remarkable effectiveness in solving vehicle routing problems. However, in the generalization process from small scale to large scale, we find a phenomenon of the dispersion of attention scores in existing neural solvers, which leads to poor performance. To address this issue, this paper proposes a distance-aware attention reshaping method, assisting neural solvers in solving large-scale vehicle routing problems. Specifically, without the need for additional training, we utilize the Euclidean distance information between current nodes to adjust attention scores. This enables a neural solver trained on small-scale instances to make rational choices when solving a large-scale problem. Experimental results show that the proposed method significantly outperforms existing state-of-the-art neural solvers on the large-scale CVRPLib dataset.

LGMay 19, 2025

LiBOG: Lifelong Learning for Black-Box Optimizer Generation

Jiyuan Pei, Yi Mei, Jialin Liu et al.

Meta-Black-Box Optimization (MetaBBO) garners attention due to its success in automating the configuration and generation of black-box optimizers, significantly reducing the human effort required for optimizer design and discovering optimizers with higher performance than classic human-designed optimizers. However, existing MetaBBO methods conduct one-off training under the assumption that a stationary problem distribution with extensive and representative training problem samples is pre-available. This assumption is often impractical in real-world scenarios, where diverse problems following shifting distribution continually arise. Consequently, there is a pressing need for methods that can continuously learn from new problems encountered on-the-fly and progressively enhance their capabilities. In this work, we explore a novel paradigm of lifelong learning in MetaBBO and introduce LiBOG, a novel approach designed to learn from sequentially encountered problems and generate high-performance optimizers for Black-Box Optimization (BBO). LiBOG consolidates knowledge both across tasks and within tasks to mitigate catastrophic forgetting. Extensive experiments demonstrate LiBOG's effectiveness in learning to generate high-performance optimizers in a lifelong learning manner, addressing catastrophic forgetting while maintaining plasticity to learn new tasks.

AIJan 20

Scalable Knee-Point Guided Activity Group Selection in Multi-Tree Genetic Programming for Dynamic Multi-Mode Project Scheduling

Yuan Tian, Yi Mei, Mengjie Zhang

The dynamic multi-mode resource-constrained project scheduling problem is a challenging scheduling problem that requires making decisions on both the execution order of activities and their corresponding execution modes. Genetic programming has been widely applied as a hyper-heuristic to evolve priority rules that guide the selection of activity-mode pairs from the current eligible set. Recently, an activity group selection strategy has been proposed to select a subset of activities rather than a single activity at each decision point, allowing for more effective scheduling by considering the interdependence between activities. Although effective in small-scale instances, this strategy suffers from scalability issues when applied to larger problems. In this work, we enhance the scalability of the group selection strategy by introducing a knee-point-based selection mechanism to identify a promising subset of activities before evaluating their combinations. An activity ordering rule is first used to rank all eligible activity-mode pairs, followed by a knee point selection to find the promising pairs. Then, a group selection rule selects the best activity combination. We develop a multi-tree GP framework to evolve both types of rules simultaneously. Experimental results demonstrate that our approach scales well to large instances and outperforms GP with sequential decision-making in most scenarios.

LGAug 22, 2025

GPLight+: A Genetic Programming Method for Learning Symmetric Traffic Signal Control Policy

Xiao-Cheng Liao, Yi Mei, Mengjie Zhang

Recently, learning-based approaches, have achieved significant success in automatically devising effective traffic signal control strategies. In particular, as a powerful evolutionary machine learning approach, Genetic Programming (GP) is utilized to evolve human-understandable phase urgency functions to measure the urgency of activating a green light for a specific phase. However, current GP-based methods are unable to treat the common traffic features of different traffic signal phases consistently. To address this issue, we propose to use a symmetric phase urgency function to calculate the phase urgency for a specific phase based on the current road conditions. This is represented as an aggregation of two shared subtrees, each representing the urgency of a turn movement in the phase. We then propose a GP method to evolve the symmetric phase urgency function. We evaluate our proposed method on the well-known cityflow traffic simulator, based on multiple public real-world datasets. The experimental results show that the proposed symmetric urgency function representation can significantly improve the performance of the learned traffic signal control policies over the traditional GP representation on a wide range of scenarios. Further analysis shows that the proposed method can evolve effective, human-understandable and easily deployable traffic signal control policies.

LGApr 23, 2025

ParetoHqD: Fast Offline Multiobjective Alignment of Large Language Models using Pareto High-quality Data

Haoran Gu, Handing Wang, Yi Mei et al.

Aligning large language models with multiple human expectations and values is crucial for ensuring that they adequately serve a variety of user needs. To this end, offline multiobjective alignment algorithms such as the Rewards-in-Context algorithm have shown strong performance and efficiency. However, inappropriate preference representations and training with imbalanced reward scores limit the performance of such algorithms. In this work, we introduce ParetoHqD that addresses the above issues by representing human preferences as preference directions in the objective space and regarding data near the Pareto front as ''high-quality'' data. For each preference, ParetoHqD follows a two-stage supervised fine-tuning process, where each stage uses an individual Pareto high-quality training set that best matches its preference direction. The experimental results have demonstrated the superiority of ParetoHqD over five baselines on two multiobjective alignment tasks.

NEJan 26

HEATACO: Heatmap-Guided Ant Colony Decoding for Large-Scale Travelling Salesman Problems

Bo-Cheng Lin, Yi Mei, Mengjie Zhang

Heatmap-based non-autoregressive solvers for large-scale Travelling Salesman Problems output dense edge-probability scores, yet final performance largely hinges on the decoder that must satisfy degree-2 constraints and form a single Hamiltonian tour. Greedy commitment can cascade into irreparable mistakes at large $N$, whereas MCTS-guided local search is accurate but compute-heavy and highly engineered. We instead treat the heatmap as a soft edge prior and cast decoding as probabilistic tour construction under feasibility constraints, where the key is to correct local mis-rankings via inexpensive global coordination. Based on this view, we introduce HeatACO, a plug-and-play Max-Min Ant System decoder whose transition policy is softly biased by the heatmap while pheromone updates provide lightweight, instance-specific feedback to resolve global conflicts; optional 2-opt/3-opt post-processing further improves tour quality. On TSP500/1K/10K, using heatmaps produced by four pretrained predictors, HeatACO+2opt achieves gaps down to 0.11%/0.23%/1.15% with seconds-to-minutes CPU decoding for fixed heatmaps, offering a better quality--time trade-off than greedy decoding and published MCTS-based decoders. Finally, we find the gains track heatmap reliability: under distribution shift, miscalibration and confidence collapse bound decoding improvements, suggesting heatmap generalisation is a primary lever for further progress.

AISep 26, 2025

DyRo-MCTS: A Robust Monte Carlo Tree Search Approach to Dynamic Job Shop Scheduling

Ruiqi Chen, Yi Mei, Fangfang Zhang et al.

Dynamic job shop scheduling, a fundamental combinatorial optimisation problem in various industrial sectors, poses substantial challenges for effective scheduling due to frequent disruptions caused by the arrival of new jobs. State-of-the-art methods employ machine learning to learn scheduling policies offline, enabling rapid responses to dynamic events. However, these offline policies are often imperfect, necessitating the use of planning techniques such as Monte Carlo Tree Search (MCTS) to improve performance at online decision time. The unpredictability of new job arrivals complicates online planning, as decisions based on incomplete problem information are vulnerable to disturbances. To address this issue, we propose the Dynamic Robust MCTS (DyRo-MCTS) approach, which integrates action robustness estimation into MCTS. DyRo-MCTS guides the production environment toward states that not only yield good scheduling outcomes but are also easily adaptable to future job arrivals. Extensive experiments show that DyRo-MCTS significantly improves the performance of offline-learned policies with negligible additional online planning time. Moreover, DyRo-MCTS consistently outperforms vanilla MCTS across various scheduling scenarios. Further analysis reveals that its ability to make robust scheduling decisions leads to long-term, sustainable performance gains under disturbances.

AISep 26, 2025

Lifelong Learning with Behavior Consolidation for Vehicle Routing

Jiyuan Pei, Yi Mei, Jialin Liu et al.

Recent neural solvers have demonstrated promising performance in learning to solve routing problems. However, existing studies are primarily based on one-off training on one or a set of predefined problem distributions and scales, i.e., tasks. When a new task arises, they typically rely on either zero-shot generalization, which may be poor due to the discrepancies between the new task and the training task(s), or fine-tuning the pretrained solver on the new task, which possibly leads to catastrophic forgetting of knowledge acquired from previous tasks. This paper explores a novel lifelong learning paradigm for neural VRP solvers, where multiple tasks with diverse distributions and scales arise sequentially over time. Solvers are required to effectively and efficiently learn to solve new tasks while maintaining their performance on previously learned tasks. Consequently, a novel framework called Lifelong Learning Router with Behavior Consolidation (LLR-BC) is proposed. LLR-BC consolidates prior knowledge effectively by aligning behaviors of the solver trained on a new task with the buffered ones in a decision-seeking way. To encourage more focus on crucial experiences, LLR-BC assigns greater consolidated weights to decisions with lower confidence. Extensive experiments on capacitated vehicle routing problems and traveling salesman problems demonstrate LLR-BC's effectiveness in training high-performance neural solvers in a lifelong learning setting, addressing the catastrophic forgetting issue, maintaining their plasticity, and improving zero-shot generalization ability.

LGSep 2, 2025

Genetic Programming with Model Driven Dimension Repair for Learning Interpretable Appointment Scheduling Rules

Huan Zhang, Yang Wang, Ya-Hui Jia et al.

Appointment scheduling is a great challenge in healthcare operations management. Appointment rules (AR) provide medical practitioners with a simple yet effective tool to determine patient appointment times. Genetic programming (GP) can be used to evolve ARs. However, directly applying GP to design ARs may lead to rules that are difficult for end-users to interpret and trust. A key reason is that GP is unaware of the dimensional consistency, which ensures that the evolved rules align with users' domain knowledge and intuitive understanding. In this paper, we develop a new dimensionally aware GP algorithm with dimension repair to evolve ARs with dimensional consistency and high performance. A key innovation of our method is the dimension repair procedure, which optimizes the dimensional consistency of an expression tree while minimizing structural changes and ensuring that its output dimension meets the problem's requirements. We formulate the task as a mixed-integer linear programming model that can be efficiently solved using common mathematical programming methods. With the support of the dimension repair procedure, our method can explore a wider range of AR structures by temporarily breaking the dimensional consistency of individuals, and then restoring it without altering their overall structure, thereby identifying individuals with greater potential advantages. We evaluated the proposed method in a comprehensive set of simulated clinics. The experimental results demonstrate that our approach managed to evolve high-quality ARs that significantly outperform not only the manually designed ARs but also existing state-of-the-art dimensionally aware GP methods in terms of both objective values and dimensional consistency. In addition, we analyzed the semantics of the evolved ARs, providing insight into the design of more effective and interpretable ARs.

NEMay 17, 2025

Curriculum Learning in Genetic Programming Guided Local Search for Large-scale Vehicle Routing Problems

Saining Liu, Yi Mei, Mengjie Zhang

Manually designing (meta-)heuristics for the Vehicle Routing Problem (VRP) is a challenging task that requires significant domain expertise. Recently, data-driven approaches have emerged as a promising solution, automatically learning heuristics that perform well on training instances and generalize to unseen test cases. Such an approach learns (meta-)heuristics that can perform well on the training instances, expecting it to generalize well on the unseen test instances. A recent method, named GPGLS, uses Genetic Programming (GP) to learn the utility function in Guided Local Search (GLS) and solved large scale VRP effectively. However, the selection of appropriate training instances during the learning process remains an open question, with most existing studies including GPGLS relying on random instance selection. To address this, we propose a novel method, CL-GPGLS, which integrates Curriculum Learning (CL) into GPGLS. Our approach leverages a predefined curriculum to introduce training instances progressively, starting with simpler tasks and gradually increasing complexity, enabling the model to better adapt and optimize for large-scale VRP (LSVRP). Extensive experiments verify the effectiveness of CL-GPGLS, demonstrating significant performance improvements over three baseline methods.

CRMay 12, 2025

One Trigger Token Is Enough: A Defense Strategy for Balancing Safety and Usability in Large Language Models

Haoran Gu, Handing Wang, Yi Mei et al.

Large Language Models (LLMs) have been extensively used across diverse domains, including virtual assistants, automated code generation, and scientific research. However, they remain vulnerable to jailbreak attacks, which manipulate the models into generating harmful responses despite safety alignment. Recent studies have shown that current safety-aligned LLMs often undergo the shallow safety alignment, where the first few tokens largely determine whether the response will be harmful. Through comprehensive observations, we find that safety-aligned LLMs and various defense strategies generate highly similar initial tokens in their refusal responses, which we define as safety trigger tokens. Building on this insight, we propose \texttt{D-STT}, a simple yet effective defense algorithm that identifies and explicitly decodes safety trigger tokens of the given safety-aligned LLM to trigger the model's learned safety patterns. In this process, the safety trigger is constrained to a single token, which effectively preserves model usability by introducing minimum intervention in the decoding process. Extensive experiments across diverse jailbreak attacks and benign prompts demonstrate that \ours significantly reduces output harmfulness while preserving model usability and incurring negligible response time overhead, outperforming ten baseline methods.

NEMay 23, 2024

Multi-Representation Genetic Programming: A Case Study on Tree-based and Linear Representations

Zhixing Huang, Yi Mei, Fangfang Zhang et al.

Existing genetic programming (GP) methods are typically designed based on a certain representation, such as tree-based or linear representations. These representations show various pros and cons in different domains. However, due to the complicated relationships among representation and fitness landscapes of GP, it is hard to intuitively determine which GP representation is the most suitable for solving a certain problem. Evolving programs (or models) with multiple representations simultaneously can alternatively search on different fitness landscapes since representations are highly related to the search space that essentially defines the fitness landscape. Fully using the latent synergies among different GP individual representations might be helpful for GP to search for better solutions. However, existing GP literature rarely investigates the simultaneous effective use of evolving multiple representations. To fill this gap, this paper proposes a multi-representation GP algorithm based on tree-based and linear representations, which are two commonly used GP representations. In addition, we develop a new cross-representation crossover operator to harness the interplay between tree-based and linear representations. Empirical results show that navigating the learned knowledge between basic tree-based and linear representations successfully improves the effectiveness of GP with solely tree-based or linear representation in solving symbolic regression and dynamic job shop scheduling problems.

AIMay 3, 2023

Local Optima Correlation Assisted Adaptive Operator Selection

Jiyuan Pei, Hao Tong, Jialin Liu et al.

For solving combinatorial optimisation problems with metaheuristics, different search operators are applied for sampling new solutions in the neighbourhood of a given solution. It is important to understand the relationship between operators for various purposes, e.g., adaptively deciding when to use which operator to find optimal solutions efficiently. However, it is difficult to theoretically analyse this relationship, especially in the complex solution space of combinatorial optimisation problems. In this paper, we propose to empirically analyse the relationship between operators in terms of the correlation between their local optima and develop a measure for quantifying their relationship. The comprehensive analyses on a wide range of capacitated vehicle routing problem benchmark instances show that there is a consistent pattern in the correlation between commonly used operators. Based on this newly proposed local optima correlation metric, we propose a novel approach for adaptively selecting among the operators during the search process. The core intention is to improve search efficiency by preventing wasting computational resources on exploring neighbourhoods where the local optima have already been reached. Experiments on randomly generated instances and commonly used benchmark datasets are conducted. Results show that the proposed approach outperforms commonly used adaptive operator selection methods.

NEMay 11, 2021

A Hybrid Decomposition-based Multi-objective Evolutionary Algorithm for the Multi-Point Dynamic Aggregation Problem

Guanqiang Gao, Bin Xin, Yi Mei et al.

An emerging optimisation problem from the real-world applications, named the multi-point dynamic aggregation (MPDA) problem, has become one of the active research topics of the multi-robot system. This paper focuses on a multi-objective MPDA problem which is to design an execution plan of the robots to minimise the number of robots and the maximal completion time of all the tasks. The strongly-coupled relationships among robots and tasks, the redundancy of the MPDA encoding, and the variable-size decision space of the MO-MPDA problem posed extra challenges for addressing the problem effectively. To address the above issues, we develop a hybrid decomposition-based multi-objective evolutionary algorithm (HDMOEA) using $ \varepsilon $-constraint method. It selects the maximal completion time of all tasks as the main objective, and converted the other objective into constraints. HDMOEA decomposes a MO-MPDA problem into a series of scalar constrained optimization subproblems by assigning each subproblem with an upper bound robot number. All the subproblems are optimized simultaneously with the transferring knowledge from other subproblems. Besides, we develop a hybrid population initialisation mechanism to enhance the quality of initial solutions, and a reproduction mechanism to transmit effective information and tackle the encoding redundancy. Experimental results show that the proposed HDMOEA method significantly outperforms the state-of-the-art methods in terms of several most-used metrics.

NEDec 29, 2019

Divide-and-Conquer Large Scale Capacitated Arc Routing Problems with Route Cutting Off Decomposition

Yuzhou Zhang, Yi Mei, Buzhong Zhang et al.

The capacitated arc routing problem is a very important problem with many practical applications. This paper focuses on the large scale capacitated arc routing problem. Traditional solution optimization approaches usually fail because of their poor scalability. The divide-and-conquer strategy has achieved great success in solving large scale optimization problems by decomposing the original large problem into smaller sub-problems and solving them separately. For arc routing, a commonly used divide-and-conquer strategy is to divide the tasks into subsets, and then solve the sub-problems induced by the task subsets separately. However, the success of a divide-and-conquer strategy relies on a proper task division, which is non-trivial due to the complex interactions between the tasks. This paper proposes a novel problem decomposition operator, named the route cutting off operator, which considers the interactions between the tasks in a sophisticated way. To examine the effectiveness of the route cutting off operator, we integrate it with two state-of-the-art divide-and-conquer algorithms, and compared with the original counterparts on a wide range of benchmark instances. The results show that the route cutting off operator can improve the effectiveness of the decomposition, and lead to significantly better results especially when the problem size is very large and the time budget is very tight.

NENov 20, 2019

Genetic Programming Hyper-Heuristics with Vehicle Collaboration for Uncertain Capacitated Arc Routing Problems

Jordan MacLachlan, Yi Mei, Juergen Branke et al.

Due to its direct relevance to post-disaster operations, meter reading and civil refuse collection, the Uncertain Capacitated Arc Routing Problem (UCARP) is an important optimisation problem. Stochastic models are critical to study as they more accurately represent the real-world than their deterministic counterparts. Although there have been extensive studies in solving routing problems under uncertainty, very few have considered UCARP, and none consider collaboration between vehicles to handle the negative effects of uncertainty. This paper proposes a novel Solution Construction Procedure (SCP) that generates solutions to UCARP within a collaborative, multi-vehicle framework. It consists of two types of collaborative activities: one when a vehicle unexpectedly expends capacity (\emph{route failure}), and the other during the refill process. Then, we propose a Genetic Programming Hyper-Heuristic (GPHH) algorithm to evolve the routing policy used within the collaborative framework. The experimental studies show that the new heuristic with vehicle collaboration and GP-evolved routing policy significantly outperforms the compared state-of-the-art algorithms on commonly studied test problems. This is shown to be especially true on instances with larger numbers of tasks and vehicles. This clearly shows the advantage of vehicle collaboration in handling the uncertain environment, and the effectiveness of the newly proposed algorithm.