Peng Yang

LG
h-index37
74papers
1,242citations
Novelty51%
AI Score57

74 Papers

73.3CEMay 31
From Flat to Hierarchical: Evolving Tree-structured Thoughts for Fine-grained Alpha Mining

Junji Ren, Junjie Zhao, Shengcai Liu et al.

Alpha mining, aimed at discovering predictive return signals, is typically formulated as symbolic regression. Traditional symbolic methods suffer from search inefficiency and biased prior knowledge. Recently, Large Language Models (LLMs) have emerged as a promising alternative, automatically generating textual thoughts and executable codes to achieve both efficient and interpretable alpha mining. However, existing approaches mostly focus on leveraging LLM's reasoning and reflection capabilities, yet largely neglect the positional bias due to the flat thought representation which restricts efficiency and diversity of the search process. This paper introduces Tree-structured thought Evolution (TreEvo), which evolves hierarchically decomposed thoughts to expand the effective search space. In addition, we propose a set of evolutionary operators tailored to structured thoughts. Experiments on four real-market datasets demonstrate that TreEvo not only obtains competitive alphas with traditional methods in up to 200 times fewer evaluations, but also consistently outperforms LLM-driven EAs across all datasets by $14.31\%$ on average.

SYDec 31, 2022
Accuracy-Guaranteed Collaborative DNN Inference in Industrial IoT via Deep Reinforcement Learning

Wen Wu, Peng Yang, Weiting Zhang et al.

Collaboration among industrial Internet of Things (IoT) devices and edge networks is essential to support computation-intensive deep neural network (DNN) inference services which require low delay and high accuracy. Sampling rate adaption which dynamically configures the sampling rates of industrial IoT devices according to network conditions, is the key in minimizing the service delay. In this paper, we investigate the collaborative DNN inference problem in industrial IoT networks. To capture the channel variation and task arrival randomness, we formulate the problem as a constrained Markov decision process (CMDP). Specifically, sampling rate adaption, inference task offloading and edge computing resource allocation are jointly considered to minimize the average service delay while guaranteeing the long-term accuracy requirements of different inference services. Since CMDP cannot be directly solved by general reinforcement learning (RL) algorithms due to the intractable long-term constraints, we first transform the CMDP into an MDP by leveraging the Lyapunov optimization technique. Then, a deep RL-based algorithm is proposed to solve the MDP. To expedite the training process, an optimization subroutine is embedded in the proposed algorithm to directly obtain the optimal edge computing resource allocation. Extensive simulation results are provided to demonstrate that the proposed RL-based algorithm can significantly reduce the average service delay while preserving long-term inference accuracy with a high probability.

SYOct 30, 2018
Distributed Load-Side Control: Coping with Variation of Renewable Generations

Zhaojian Wang, Shengwei Mei, Feng Liu et al.

This paper addresses the distributed frequency control problem in a multi-area power system taking into account of unknown time-varying power imbalance. Particularly, fast controllable loads are utilized to restore system frequency under changing power imbalance in an optimal manner. The imbalanced power causing frequency deviation is decomposed into three parts: a known constant part, an unknown low-frequency variation and a high-frequency residual. The known steady part is usually the prediction of power imbalance. The variation may result from the fluctuation of renewable resources, electric vehicle charging, etc., which is usually unknown to operators. The high-frequency residual is usually unknown and treated as an external disturbance. Correspondingly, in this paper, we resolve the following three problems in different timescales: 1) allocate the steady part of power imbalance economically; 2) mitigate the effect of unknown low-frequency power variation locally; 3) attenuate unknown high-frequency disturbances. To this end, a distributed controller combining consensus method with adaptive internal model control is proposed. We first prove that the closed-loop system is asymptotically stable and converges to the optimal solution of an optimization problem if the external disturbance is not included. We then prove that the power variation can be mitigated accurately. Furthermore, we show that the closed-loop system is robust against both parameter uncertainty and external disturbances. The New England system is used to verify the efficacy of our design.

CVMay 31, 2022
One Loss for Quantization: Deep Hashing with Discrete Wasserstein Distributional Matching

Khoa D. Doan, Peng Yang, Ping Li · baidu

Image hashing is a principled approximate nearest neighbor approach to find similar items to a query in a large collection of images. Hashing aims to learn a binary-output function that maps an image to a binary vector. For optimal retrieval performance, producing balanced hash codes with low-quantization error to bridge the gap between the learning stage's continuous relaxation and the inference stage's discrete quantization is important. However, in the existing deep supervised hashing methods, coding balance and low-quantization error are difficult to achieve and involve several losses. We argue that this is because the existing quantization approaches in these methods are heuristically constructed and not effective to achieve these objectives. This paper considers an alternative approach to learning the quantization constraints. The task of learning balanced codes with low quantization error is re-formulated as matching the learned distribution of the continuous codes to a pre-defined discrete, uniform distribution. This is equivalent to minimizing the distance between two distributions. We then propose a computationally efficient distributional distance by leveraging the discrete property of the hash functions. This distributional distance is a valid distance and enjoys lower time and sample complexities. The proposed single-loss quantization objective can be integrated into any existing supervised hashing method to improve code balance and quantization error. Experiments confirm that the proposed approach substantially improves the performance of several representative hashing~methods.

CVOct 26, 2022
CU-Net: LiDAR Depth-Only Completion With Coupled U-Net

Yufei Wang, Yuchao Dai, Qi Liu et al.

LiDAR depth-only completion is a challenging task to estimate dense depth maps only from sparse measurement points obtained by LiDAR. Even though the depth-only methods have been widely developed, there is still a significant performance gap with the RGB-guided methods that utilize extra color images. We find that existing depth-only methods can obtain satisfactory results in the areas where the measurement points are almost accurate and evenly distributed (denoted as normal areas), while the performance is limited in the areas where the foreground and background points are overlapped due to occlusion (denoted as overlap areas) and the areas where there are no measurement points around (denoted as blank areas) since the methods have no reliable input information in these areas. Building upon these observations, we propose an effective Coupled U-Net (CU-Net) architecture for depth-only completion. Instead of directly using a large network for regression, we employ the local U-Net to estimate accurate values in the normal areas and provide the global U-Net with reliable initial values in the overlap and blank areas. The depth maps predicted by the two coupled U-Nets are fused by learned confidence maps to obtain final results. In addition, we propose a confidence-based outlier removal module, which removes outliers using simple judgment conditions. Our proposed method boosts the final results with fewer parameters and achieves state-of-the-art results on the KITTI benchmark. Moreover, it owns a powerful generalization ability under various depth densities, varying lighting, and weather conditions.

CVJun 24, 2022
Defending Backdoor Attacks on Vision Transformer via Patch Processing

Khoa D. Doan, Yingjie Lao, Peng Yang et al. · baidu

Vision Transformers (ViTs) have a radically different architecture with significantly less inductive bias than Convolutional Neural Networks. Along with the improvement in performance, security and robustness of ViTs are also of great importance to study. In contrast to many recent works that exploit the robustness of ViTs against adversarial examples, this paper investigates a representative causative attack, i.e., backdoor. We first examine the vulnerability of ViTs against various backdoor attacks and find that ViTs are also quite vulnerable to existing attacks. However, we observe that the clean-data accuracy and backdoor attack success rate of ViTs respond distinctively to patch transformations before the positional encoding. Then, based on this finding, we propose an effective method for ViTs to defend both patch-based and blending-based trigger backdoor attacks via patch processing. The performances are evaluated on several benchmark datasets, including CIFAR10, GTSRB, and TinyImageNet, which show the proposed novel defense is very successful in mitigating backdoor attacks for ViTs. To the best of our knowledge, this paper presents the first defensive strategy that utilizes a unique characteristic of ViTs against backdoor attacks. The paper will appear in the Proceedings of the AAAI'23 Conference. This work was initially submitted in November 2021 to CVPR'22, then it was re-submitted to ECCV'22. The paper was made public in June 2022. The authors sincerely thank all the referees from the Program Committees of CVPR'22, ECCV'22, and AAAI'23.

ITMar 29, 2022
Over-the-Air Federated Learning via Second-Order Optimization

Peng Yang, Yuning Jiang, Ting Wang et al.

Federated learning (FL) is a promising learning paradigm that can tackle the increasingly prominent isolated data islands problem while keeping users' data locally with privacy and security guarantees. However, FL could result in task-oriented data traffic flows over wireless networks with limited radio resources. To design communication-efficient FL, most of the existing studies employ the first-order federated optimization approach that has a slow convergence rate. This however results in excessive communication rounds for local model updates between the edge devices and edge server. To address this issue, in this paper, we instead propose a novel over-the-air second-order federated optimization algorithm to simultaneously reduce the communication rounds and enable low-latency global model aggregation. This is achieved by exploiting the waveform superposition property of a multi-access channel to implement the distributed second-order optimization algorithm over wireless networks. The convergence behavior of the proposed algorithm is further characterized, which reveals a linear-quadratic convergence rate with an accumulative error term in each iteration. We thus propose a system optimization approach to minimize the accumulated error gap by joint device selection and beamforming design. Numerical results demonstrate the system and communication efficiency compared with the state-of-the-art approaches.

NIDec 31, 2022
Cost-Effective Two-Stage Network Slicing for Edge-Cloud Orchestrated Vehicular Networks

Wen Wu, Kaige Qu, Peng Yang et al.

In this paper, we study a network slicing problem for edge-cloud orchestrated vehicular networks, in which the edge and cloud servers are orchestrated to process computation tasks for reducing network slicing cost while satisfying the quality of service requirements. We propose a two-stage network slicing framework, which consists of 1) network planning stage in a large timescale to perform slice deployment, edge resource provisioning, and cloud resource provisioning, and 2) network operation stage in a small timescale to perform resource allocation and task dispatching. Particularly, we formulate the network slicing problem as a two-timescale stochastic optimization problem to minimize the network slicing cost. Since the problem is NP-hard due to coupled network planning and network operation stages, we develop a Two timescAle netWork Slicing (TAWS) algorithm by collaboratively integrating reinforcement learning (RL) and optimization methods, which can jointly make network planning and operation decisions. Specifically, by leveraging the timescale separation property of decisions, we decouple the problem into a large-timescale network planning subproblem and a small-timescale network operation subproblem. The former is solved by an RL method, and the latter is solved by an optimization method. Simulation results based on real-world vehicle traffic traces show that the TAWS can effectively reduce the network slicing cost as compared to the benchmark scheme.

MMJul 29, 2024
AxiomVision: Accuracy-Guaranteed Adaptive Visual Model Selection for Perspective-Aware Video Analytics

Xiangxiang Dai, Zeyu Zhang, Peng Yang et al. · uw

The rapid evolution of multimedia and computer vision technologies requires adaptive visual model deployment strategies to effectively handle diverse tasks and varying environments. This work introduces AxiomVision, a novel framework that can guarantee accuracy by leveraging edge computing to dynamically select the most efficient visual models for video analytics under diverse scenarios. Utilizing a tiered edge-cloud architecture, AxiomVision enables the deployment of a broad spectrum of visual models, from lightweight to complex DNNs, that can be tailored to specific scenarios while considering camera source impacts. In addition, AxiomVision provides three core innovations: (1) a dynamic visual model selection mechanism utilizing continual online learning, (2) an efficient online method that efficiently takes into account the influence of the camera's perspective, and (3) a topology-driven grouping approach that accelerates the model selection process. With rigorous theoretical guarantees, these advancements provide a scalable and effective solution for visual tasks inherent to multimedia systems, such as object detection, classification, and counting. Empirically, AxiomVision achieves a 25.7\% improvement in accuracy.

SDDec 13, 2022
Style-Label-Free: Cross-Speaker Style Transfer by Quantized VAE and Speaker-wise Normalization in Speech Synthesis

Chunyu Qiang, Peng Yang, Hao Che et al.

Cross-speaker style transfer in speech synthesis aims at transferring a style from source speaker to synthesised speech of a target speaker's timbre. Most previous approaches rely on data with style labels, but manually-annotated labels are expensive and not always reliable. In response to this problem, we propose Style-Label-Free, a cross-speaker style transfer method, which can realize the style transfer from source speaker to target speaker without style labels. Firstly, a reference encoder structure based on quantized variational autoencoder (Q-VAE) and style bottleneck is designed to extract discrete style representations. Secondly, a speaker-wise batch normalization layer is proposed to reduce the source speaker leakage. In order to improve the style extraction ability of the reference encoder, a style invariant and contrastive data augmentation method is proposed. Experimental results show that the method outperforms the baseline. We provide a website with audio samples.

CLNov 5, 2025Code
Step-Audio-EditX Technical Report

Chao Yan, Boyong Wu, Peng Yang et al.

We present Step-Audio-EditX, the first open-source LLM-based audio model excelling at expressive and iterative audio editing encompassing emotion, speaking style, and paralinguistics alongside robust zero-shot text-to-speech (TTS) capabilities. Our core innovation lies in leveraging only large-margin synthetic data, which circumvents the need for embedding-based priors or auxiliary modules. This large-margin learning approach enables both iterative control and high expressivity across voices, and represents a fundamental pivot from the conventional focus on representation-level disentanglement. Evaluation results demonstrate that Step-Audio-EditX surpasses both MiniMax-2.6-hd and Doubao-Seed-TTS-2.0 in emotion editing and other fine-grained control tasks.

SDNov 17, 2022
Back-Translation-Style Data Augmentation for Mandarin Chinese Polyphone Disambiguation

Chunyu Qiang, Peng Yang, Hao Che et al.

Conversion of Chinese Grapheme-to-Phoneme (G2P) plays an important role in Mandarin Chinese Text-To-Speech (TTS) systems, where one of the biggest challenges is the task of polyphone disambiguation. Most of the previous polyphone disambiguation models are trained on manually annotated datasets, and publicly available datasets for polyphone disambiguation are scarce. In this paper we propose a simple back-translation-style data augmentation method for mandarin Chinese polyphone disambiguation, utilizing a large amount of unlabeled text data. Inspired by the back-translation technique proposed in the field of machine translation, we build a Grapheme-to-Phoneme (G2P) model to predict the pronunciation of polyphonic character, and a Phoneme-to-Grapheme (P2G) model to predict pronunciation into text. Meanwhile, a window-based matching strategy and a multi-model scoring strategy are proposed to judge the correctness of the pseudo-label. We design a data balance strategy to improve the accuracy of some typical polyphonic characters in the training set with imbalanced distribution or data scarcity. The experimental result shows the effectiveness of the proposed back-translation-style data augmentation method.

CRNov 17, 2022
SFPDML: Securer and Faster Privacy-Preserving Distributed Machine Learning based on MKTFHE

Hongxiao Wang, Zoe L. Jiang, Yanmin Zhao et al.

In recent years, distributed machine learning has garnered significant attention. However, privacy continues to be an unresolved issue within this field. Multi-key homomorphic encryption over torus (MKTFHE) is one of the promising candidates for addressing this concern. Nevertheless, there may be security risks in the decryption of MKTFHE. Moreover, to our best known, the latest works about MKTFHE only support Boolean operation and linear operation which cannot directly compute the non-linear function like Sigmoid. Therefore, it is still hard to perform common machine learning such as logistic regression and neural networks in high performance. In this paper, we first discover a possible attack on the existing distributed decryption protocol for MKTFHE and subsequently introduce secret sharing to propose a securer one. Next, we design a new MKTFHE-friendly activation function via \emph{homogenizer} and \emph{compare quads}. Finally, we utilize them to implement logistic regression and neural network training in MKTFHE. Comparing the efficiency and accuracy between using Taylor polynomials of Sigmoid and our proposed function as an activation function, the experiments show that the efficiency of our function is 10 times higher than using 7-order Taylor polynomials straightly and the accuracy of the training model is similar to using a high-order polynomial as an activation function scheme.

LGMar 1Code
MOSAIC: A Unified Platform for Cross-Paradigm Comparison and Evaluation of Homogeneous and Heterogeneous Multi-Agent RL, LLM, VLM, and Human Decision-Makers

Abdulhamid M. Mousa, Yu Fu, Rakhmonberdi Khajiev et al.

Reinforcement learning (RL), large language models (LLMs), and vision-language models (VLMs) have been widely studied in isolation. However, existing infrastructure lacks the ability to deploy agents from different decision-making paradigms within the same environment, making it difficult to study them in hybrid multi-agent settings or to compare their behaviour fairly under identical conditions. We present MOSAIC, an open-source platform that bridges this gap by incorporating a diverse set of existing reinforcement learning environments and enabling heterogeneous agents (RL policies, LLMs, VLMs, and human players) to operate within them in ad-hoc team settings with reproducible results. MOSAIC introduces three contributions. (i) An IPC-based worker protocol that wraps both native and third-party frameworks as isolated subprocess workers, each executing its native training and inference logic unmodified, communicating through a versioned inter-process protocol. (ii) An operator abstraction that forms an agent-level interface by mapping workers to agents: each operator, regardless of whether it is backed by an RL policy, an LLM, or a human, conforms to a minimal unified interface. (iii) A deterministic cross-paradigm evaluation framework offering two complementary modes: a manual mode that advances up to N concurrent operators in lock-step under shared seeds for fine-grained visual inspection of behavioural differences, and a script mode that drives automated, long-running evaluation through declarative Python scripts, for reproducible experiments. We release MOSAIC as an open, visual-first platform to facilitate reproducible cross-paradigm research across the RL, LLM, and human-in-the-loop communities.

LGMar 23, 2023
Automated Federated Learning in Mobile Edge Networks -- Fast Adaptation and Convergence

Chaoqun You, Kun Guo, Gang Feng et al.

Federated Learning (FL) can be used in mobile edge networks to train machine learning models in a distributed manner. Recently, FL has been interpreted within a Model-Agnostic Meta-Learning (MAML) framework, which brings FL significant advantages in fast adaptation and convergence over heterogeneous datasets. However, existing research simply combines MAML and FL without explicitly addressing how much benefit MAML brings to FL and how to maximize such benefit over mobile edge networks. In this paper, we quantify the benefit from two aspects: optimizing FL hyperparameters (i.e., sampled data size and the number of communication rounds) and resource allocation (i.e., transmit power) in mobile edge networks. Specifically, we formulate the MAML-based FL design as an overall learning time minimization problem, under the constraints of model accuracy and energy consumption. Facilitated by the convergence analysis of MAML-based FL, we decompose the formulated problem and then solve it using analytical solutions and the coordinate descent method. With the obtained FL hyperparameters and resource allocation, we design a MAML-based FL algorithm, called Automated Federated Learning (AutoFL), that is able to conduct fast adaptation and convergence. Extensive experimental results verify that AutoFL outperforms other benchmark algorithms regarding the learning time and convergence performance.

95.8SYApr 17
Data-Driven Distributed Stability Certification for Power Systems via Input-State Trajectories

Xiaohui Zhang, Liaoyuan Yang, Peng Yang

This article proposes a data-driven framework to verify the distributed conditions that guarantee the system-wide stability for interconnected power systems. To guarantee system wide stability, the dynamics of each bus are required to satisfy an output differential passivity (ODP) condition with a sufficient index. These ODP indices uniformly quantify the impacts on the system-wide stability of individual bus dynamics and the coupling strength from the power network. To obtain these indices without explicit physical models, we derive a data-driven linear matrix inequality (LMI) criterion based exclusively on measured input-state trajectories. Furthermore, extracting the optimal ODP index is formulated as a convex semi-definite programming (SDP) problem. Simulations verify the effectiveness of the proposed method under both single-device offline evaluation and system-wide online certification scenarios.

SDMar 14, 2023
Improving Prosody for Cross-Speaker Style Transfer by Semi-Supervised Style Extractor and Hierarchical Modeling in Speech Synthesis

Chunyu Qiang, Peng Yang, Hao Che et al.

Cross-speaker style transfer in speech synthesis aims at transferring a style from source speaker to synthesized speech of a target speaker's timbre. In most previous methods, the synthesized fine-grained prosody features often represent the source speaker's average style, similar to the one-to-many problem(i.e., multiple prosody variations correspond to the same text). In response to this problem, a strength-controlled semi-supervised style extractor is proposed to disentangle the style from content and timbre, improving the representation and interpretability of the global style embedding, which can alleviate the one-to-many mapping and data imbalance problems in prosody prediction. A hierarchical prosody predictor is proposed to improve prosody modeling. We find that better style transfer can be achieved by using the source speaker's prosody features that are easily predicted. Additionally, a speaker-transfer-wise cycle consistency loss is proposed to assist the model in learning unseen style-timbre combinations during the training phase. Experimental results show that the method outperforms the baseline. We provide a website with audio samples.

LGOct 10, 2023
Diversity from Human Feedback

Ren-Jian Wang, Ke Xue, Yutong Wang et al.

Diversity plays a significant role in many problems, such as ensemble learning, reinforcement learning, and combinatorial optimization. How to define the diversity measure is a longstanding problem. Many methods rely on expert experience to define a proper behavior space and then obtain the diversity measure, which is, however, challenging in many scenarios. In this paper, we propose the problem of learning a behavior space from human feedback and present a general method called Diversity from Human Feedback (DivHF) to solve it. DivHF learns a behavior descriptor consistent with human preference by querying human feedback. The learned behavior descriptor can be combined with any distance measure to define a diversity measure. We demonstrate the effectiveness of DivHF by integrating it with the Quality-Diversity optimization algorithm MAP-Elites and conducting experiments on the QDax suite. The results show that the behavior learned by DivHF is much more consistent with human requirements than the one learned by direct data-driven approaches without human feedback, and makes the final solutions more diverse under human preference. Our contributions include formulating the problem, proposing the DivHF method, and demonstrating its effectiveness through experiments.

CLSep 23, 2022
KeypartX: Graph-based Perception (Text) Representation

Peng Yang

The availability of big data has opened up big opportunities for individuals, businesses and academics to view big into what is happening in their world. Previous works of text representation mostly focused on informativeness from massive words' frequency or cooccurrence. However, big data is a double-edged sword which is big in volume but unstructured in format. The unstructured edge requires specific techniques to transform 'big' into meaningful instead of informative alone. This study presents KeypartX, a graph-based approach to represent perception (text in general) by key parts of speech. Different from bag-of-words/vector-based machine learning, this technique is human-like learning that could extracts meanings from linguistic (semantic, syntactic and pragmatic) information. Moreover, KeypartX is big-data capable but not hungry, which is even applicable to the minimum unit of text:sentence.

53.3CEApr 14
Efficient Parameter Calibration of Numerical Weather Prediction Models via Evolutionary Sequential Transfer Optimization

Heping Fang, Bingdong Li, Peng Yang

The configuration of physical parameterization schemes in Numerical Weather Prediction (NWP) models plays a critical role in determining the accuracy of the forecast. However, existing parameter calibration methods typically treat each calibration task as an isolated optimization problem. This approach suffers from prohibitive computational costs and necessitates performing iterative searches from scratch for each task, leading to low efficiency in sequential calibration scenarios. To address this issue, we propose the SEquential Evolutionary Transfer Optimization (SEETO) algorithm driven by the representations of the meteorological state. First, to accurately measure the physical similarity between calibration tasks, a meteorological state representation extractor is introduced to map high-dimensional meteorological fields into latent representations. Second, given the similarity in the latent space, a bi-level adaptive knowledge transfer mechanism is designed. At the solution level, superior populations from similar historical tasks are reused to achieve a "warm start" for optimization. At the model level, an ensemble surrogate model based on source task data is constructed to assist the search, employing an adaptive weighting mechanism to dynamically balance the contributions of source domain knowledge and target domain data. Extensive experiments across 10 distinct calibration tasks, which span varying source-target similarities, highlight SEETO's superior efficiency. Under a strict budget of 20 expensive evaluations, SEETO achieves a 6% average improvement in Hypervolume (HV) over two state-of-the-art baselines. Notably, to match SEETO's performance at this stage, the comparison algorithms would require an average of 64% and 28% additional evaluations, respectively. This presents a new paradigm for the efficient and accurate automated calibration of NWP model parameters.

51.6SEMar 27
Large Language Models for Software Testing Education: an Experience Report

Peng Yang, Yunfeng Zhu, Chao Chang et al.

The rapid integration of Large Language Models (LLMs) into software engineering practice is reshaping how software testing activities are performed. LLMs are increasingly used to support software testing. Consequently, software testing education must evolve to prepare students for this new paradigm. However, while students have already begun to use LLMs in an ad hoc manner for testing tasks, there is limited empirical understanding of how such usage influences their testing behaviors, judgment, and learning outcomes. It is necessary to conduct a systematic investigation into how students learn to evaluate, control, and refine LLM-assisted testing results. This paper presents a mixed-methods, two-phase exploratory study on human-LLM collaboration in software testing education. In Phase I, we analyze classroom learning artifacts and interaction records from 15 students, together with a large-scale survey conducted in a national software testing competition (337 valid responses), to identify recurring prompt-related difficulties across testing tasks. The results reveal systematic interaction breakdowns, including missing contextual information, insufficient constraints, rigid one-shot prompting, and limited strategy-driven iteration, with automated test script generation emerging as a particularly heterogeneous and effort-intensive interaction context. Building on these findings, Phase II conducts an illustrative classroom practice that operationalizes the observed breakdowns into a lightweight, stage-aware prompt scaffold for test script generation, guiding students to explicitly articulate execution-relevant information such as environmental assumptions, interaction grounding, synchronization, and validation intent, and reporting descriptive shifts in students' testing-related articulation when interacting with LLMs.

80.2NIApr 10
Generative AI Agent Empowered Power Allocation for HAP Propulsion and Communication Systems

Xiaoyu Xing, Dingyi Lu, Peng Yang et al.

High altitude platforms (HAPs) are emerging as a key enabler for 6G coverage, yet limited energy must be split between propulsion and communications. Most prior HAP studies ignore propulsion power or rely on surrogates that miss hull-propeller interference, leading to misestimated communication power budgets and degraded beamforming. More importantly, HAP power allocation is intrinsically a multi-system, multidisciplinary problem in which aerodynamics, propulsion-system efficiency, and communication-system performance (quality of service (QoS) and energy efficiency (EE)) are tightly coupled.To address these challenges, this paper designs an interactive generative artificial intelligence (AI)-empowered HAP power allocation agent.By interacting with the AI agent, we develop an accurate propulsion power consumption model that takes into account the aerodynamic interference between the HAP's hull and the propeller. Assisted by the AI agent, we further formulate a HAP beamforming problem to improve user QoS and enhance the EE of the HAP communication system.This paper also proposes a QoS-enhanced energy-efficient (Q3E) beamforming algorithm to solve the formulated problem.Simulation results demonstrate the accuracy of the propulsion-power model and the effectiveness of the Q3E algorithm.

93.9NIApr 10
Multimodal Large Language Model Enabled Robust Beamforming for HAP Downlink Communications

Xiaoyu Xing, Peng Yang, Guoquan Tao et al.

Small changes in high altitude platform (HAP) attitude can cause significant deviations in HAP downlink beam directions, thereby severely degrading HAP downlink communication performance. In this paper, we develop a multimodal large language model (LLM) enabled beamforming framework to achieve robust HAP downlink communications.Specifically, we design a vision-language LLM (VL-LLM) that learns from multivariate flight telemetry to forecast short-term HAP attitudes under platform shaking and support delay-aware proactive beam steering.We design an offline forecast-error calibration procedure to obtain upper bounds on forecast errors and improve the reliability of proactive analog beam steering.Based on the attitude forecasts, we proactively update the analog beamformer and propose a QoS-driven beamforming and admission method with a lightweight feasibility-enforcement step to satisfy instantaneous transmit-power and QoS requirements.Simulation results indicate that the designed VL-LLM can accurately capture changes in the HAP attitude and the proposed beamforming method achieves a 22.1% higher user service ratio and a 12.5% higher sum-rate than representative baselines.The measured mean and p99 total latencies are 36.24 ms and 40.13 ms, respectively, supporting practical delay-aware deployment.

CPSep 8, 2024
QuantFactor REINFORCE: Mining Steady Formulaic Alpha Factors with Variance-bounded REINFORCE

Junjie Zhao, Chengxi Zhang, Min Qin et al.

Alpha factor mining aims to discover investment signals from the historical financial market data, which can be used to predict asset returns and gain excess profits. Powerful deep learning methods for alpha factor mining lack interpretability, making them unacceptable in the risk-sensitive real markets. Formulaic alpha factors are preferred for their interpretability, while the search space is complex and powerful explorative methods are urged. Recently, a promising framework is proposed for generating formulaic alpha factors using deep reinforcement learning, and quickly gained research focuses from both academia and industries. This paper first argues that the originally employed policy training method, i.e., Proximal Policy Optimization (PPO), faces several important issues in the context of alpha factors mining. Herein, a novel reinforcement learning algorithm based on the well-known REINFORCE algorithm is proposed. REINFORCE employs Monte Carlo sampling to estimate the policy gradient-yielding unbiased but high variance estimates. The minimal environmental variability inherent in the underlying state transition function, which adheres to the Dirac distribution, can help alleviate this high variance issue, making REINFORCE algorithm more appropriate than PPO. A new dedicated baseline is designed to theoretically reduce the commonly suffered high variance of REINFORCE. Moreover, the information ratio is introduced as a reward shaping mechanism to encourage the generation of steady alpha factors that can better adapt to changes in market volatility. Evaluations on real assets data indicate the proposed algorithm boosts correlation with returns by 3.83\%, and a stronger ability to obtain excess returns compared to the latest alpha factors mining methods, which meets the theoretical results well.

LGSep 9, 2024
Resource-Efficient Generative AI Model Deployment in Mobile Edge Networks

Yuxin Liang, Peng Yang, Yuanyuan He et al.

The surging development of Artificial Intelligence-Generated Content (AIGC) marks a transformative era of the content creation and production. Edge servers promise attractive benefits, e.g., reduced service delay and backhaul traffic load, for hosting AIGC services compared to cloud-based solutions. However, the scarcity of available resources on the edge pose significant challenges in deploying generative AI models. In this paper, by characterizing the resource and delay demands of typical generative AI models, we find that the consumption of storage and GPU memory, as well as the model switching delay represented by I/O delay during the preloading phase, are significant and vary across models. These multidimensional coupling factors render it difficult to make efficient edge model deployment decisions. Hence, we present a collaborative edge-cloud framework aiming to properly manage generative AI model deployment on the edge. Specifically, we formulate edge model deployment problem considering heterogeneous features of models as an optimization problem, and propose a model-level decision selection algorithm to solve it. It enables pooled resource sharing and optimizes the trade-off between resource consumption and delay in edge generative AI model deployment. Simulation results validate the efficacy of the proposed algorithm compared with baselines, demonstrating its potential to reduce overall costs by providing feature-aware model deployment decisions.

40.3SEApr 5
Detecting Malicious Intents in Smart Contracts with Pre-trained Programming Language Models

Youwei Huang, Jianwen Li, Bin Hu et al.

Malicious developer intents in smart contracts constitute significant security threats to decentralized applications, leading to substantial economic losses. Prior work introduced SmartIntentNN, a deep learning model for detecting unsafe developer intents. By combining the Universal Sentence Encoder, a K-means clustering-based intent highlighting mechanism, and a Bidirectional Long Short-Term Memory (BiLSTM) network, the model achieved an F1 score of 0.8633 on an evaluation set of 10,000 real-world smart contracts across ten distinct intent categories. This paper presents SmartIntentV2 (Smart Contract Intent Neural Network Version 2). The primary enhancement is the integration of a BERT-based pre-trained programming language model, which we domain-adaptively pre-train on a dataset of 16,000 real-world smart contracts using a Masked Language Modeling objective. SmartIntentV2 retains the BiLSTM-based multi-label classification network for intent detection. On the same evaluation set of 10,000 smart contracts, it achieves superior performance with an accuracy of 0.9789, precision of 0.9090, recall of 0.9476, and an F1 score of 0.9279, substantially outperforming its predecessor and other baseline models. Notably, SmartIntentV2 also delivers a 65.5% relative improvement in F1 score over GPT-4.1 on this specialized task. These results establish SmartIntentV2 as a new state-of-the-art model for smart contract intent detection.

31.7LGApr 1
A Decoupled Basis-Vector-Driven Generative Framework for Dynamic Multi-Objective Optimization

Yaoming Yang, Shuai Wang, Bingdong Li et al.

Dynamic multi-objective optimization requires continuous tracking of moving Pareto fronts. Existing methods struggle with irregular mutations and data sparsity, primarily facing three challenges: the non-linear coupling of dynamic modes, negative transfer from outdated historical data, and the cold-start problem during environmental switches. To address these issues, this paper proposes a decoupled basis-vector-driven generative framework (DB-GEN). First, to resolve non-linear coupling, the framework employs the discrete wavelet transform to separate evolutionary trajectories into low-frequency trends and high-frequency details. Second, to mitigate negative transfer, it learns transferable basis vectors via sparse dictionary learning rather than directly memorizing historical instances. Recomposing these bases under a topology-aware contrastive constraint constructs a structured latent manifold. Finally, to overcome the cold-start problem, a surrogate-assisted search paradigm samples initial populations from this manifold. Pre-trained on 120 million solutions, DB-GEN performs direct online inference without retraining or fine-tuning. This zero-shot generation process executes in milliseconds, requiring approximately 0.2 seconds per environmental change. Experimental results demonstrate that DB-GEN improves tracking accuracy across various dynamic benchmarks compared to existing algorithms.

92.1AIMar 31Code
Owl-AuraID 1.0: An Intelligent System for Autonomous Scientific Instrumentation and Scientific Data Analysis

Han Deng, Anqi Zou, Hanling Zhang et al.

Scientific discovery increasingly depends on high-throughput characterization, yet automation is hindered by proprietary GUIs and the limited generalizability of existing API-based systems. We present Owl-AuraID, a software-hardware collaborative embodied agent system that adopts a GUI-native paradigm to operate instruments through the same interfaces as human experts. Its skill-centric framework integrates Type-1 (GUI operation) and Type-2 (data analysis) skills into end-to-end workflows, connecting physical sample handling with scientific interpretation. Owl-AuraID demonstrates broad coverage across ten categories of precision instruments and diverse workflows, including multimodal spectral analysis, microscopic imaging, and crystallographic analysis, supporting modalities such as FTIR, NMR, AFM, and TGA. Overall, Owl-AuraID provides a practical, extensible foundation for autonomous laboratories and illustrates a path toward evolving laboratory intelligence through reusable operational and analytical skills. The code are available at https://github.com/OpenOwlab/AuraID.

CEMay 4, 2025Code
Representation Learning of Limit Order Book: A Comprehensive Study and Benchmarking

Muyao Zhong, Yushi Lin, Peng Yang

The Limit Order Book (LOB), the mostly fundamental data of the financial market, provides a fine-grained view of market dynamics while poses significant challenges in dealing with the esteemed deep models due to its strong autocorrelation, cross-feature constrains, and feature scale disparity. Existing approaches often tightly couple representation learning with specific downstream tasks in an end-to-end manner, failed to analyze the learned representations individually and explicitly, limiting their reusability and generalization. This paper conducts the first systematic comparative study of LOB representation learning, aiming to identify the effective way of extracting transferable, compact features that capture essential LOB properties. We introduce LOBench, a standardized benchmark with real China A-share market data, offering curated datasets, unified preprocessing, consistent evaluation metrics, and strong baselines. Extensive experiments validate the sufficiency and necessity of LOB representations for various downstream tasks and highlight their advantages over both the traditional task-specific end-to-end models and the advanced representation learning models for general time series. Our work establishes a reproducible framework and provides clear guidelines for future research. Datasets and code will be publicly available at https://github.com/financial-simulation-lab/LOBench.

DCSep 9, 2024
Joint Model Assignment and Resource Allocation for Cost-Effective Mobile Generative Services

Shuangwei Gao, Peng Yang, Yuxin Kong et al.

Artificial Intelligence Generated Content (AIGC) services can efficiently satisfy user-specified content creation demands, but the high computational requirements pose various challenges to supporting mobile users at scale. In this paper, we present our design of an edge-enabled AIGC service provisioning system to properly assign computing tasks of generative models to edge servers, thereby improving overall user experience and reducing content generation latency. Specifically, once the edge server receives user requested task prompts, it dynamically assigns appropriate models and allocates computing resources based on features of each category of prompts. The generated contents are then delivered to users. The key to this system is a proposed probabilistic model assignment approach, which estimates the quality score of generated contents for each prompt based on category labels. Next, we introduce a heuristic algorithm that enables adaptive configuration of both generation steps and resource allocation, according to the various task requests received by each generative model on the edge.Simulation results demonstrate that the designed system can effectively enhance the quality of generated content by up to 4.7% while reducing response delay by up to 39.1% compared to benchmarks.

NEJan 27
Posterior Distribution-assisted Evolutionary Dynamic Optimization as an Online Calibrator for Complex Social Simulations

Peng Yang, Zhenhua Yang, Boquan Jiang et al.

The calibration of simulators for complex social systems aims to identify the optimal parameter that drives the output of the simulator best matching the target data observed from the system. As many social systems may change internally over time, calibration naturally becomes an online task, requiring parameters to be updated continuously to maintain the simulator's fidelity. In this work, the online setting is first formulated as a dynamic optimization problem (DOP), requiring the search for a sequence of optimal parameters that fit the simulator to real system changes. However, in contrast to traditional DOP formulations, online calibration explicitly incorporates the observational data as the driver of environmental dynamics. Due to this fundamental difference, existing Evolutionary Dynamic Optimization (EDO) methods, despite being extensively studied for black-box DOPs, are ill-equipped to handle such a scenario. As a result, online calibration problems constitute a new set of challenging DOPs. Here, we propose to explicitly learn the posterior distributions of the parameters and the observational data, thereby facilitating both change detection and environmental adaptation of existing EDOs for this scenario. We thus present a pretrained posterior model for implementation, and fine-tune it during the optimization. Extensive tests on both economic and financial simulators verify that the posterior distribution strongly promotes EDOs in such DOPs widely existed in social science.

CVMay 28, 2025Code
Adapting Segment Anything Model for Power Transmission Corridor Hazard Segmentation

Hang Chen, Maoyuan Ye, Peng Yang et al.

Power transmission corridor hazard segmentation (PTCHS) aims to separate transmission equipment and surrounding hazards from complex background, conveying great significance to maintaining electric power transmission safety. Recently, the Segment Anything Model (SAM) has emerged as a foundational vision model and pushed the boundaries of segmentation tasks. However, SAM struggles to deal with the target objects in complex transmission corridor scenario, especially those with fine structure. In this paper, we propose ELE-SAM, adapting SAM for the PTCHS task. Technically, we develop a Context-Aware Prompt Adapter to achieve better prompt tokens via incorporating global-local features and focusing more on key regions. Subsequently, to tackle the hazard objects with fine structure in complex background, we design a High-Fidelity Mask Decoder by leveraging multi-granularity mask features and then scaling them to a higher resolution. Moreover, to train ELE-SAM and advance this field, we construct the ELE-40K benchmark, the first large-scale and real-world dataset for PTCHS including 44,094 image-mask pairs. Experimental results for ELE-40K demonstrate the superior performance that ELE-SAM outperforms the baseline model with the average 16.8% mIoU and 20.6% mBIoU performance improvement. Moreover, compared with the state-of-the-art method on HQSeg-44K, the average 2.9% mIoU and 3.8% mBIoU absolute improvements further validate the effectiveness of our method on high-quality generic object segmentation. The source code and dataset are available at https://github.com/Hhaizee/ELE-SAM.

CVMay 15, 2023Code
Global and Local Mixture Consistency Cumulative Learning for Long-tailed Visual Recognitions

Fei Du, Peng Yang, Qi Jia et al.

In this paper, our goal is to design a simple learning paradigm for long-tail visual recognition, which not only improves the robustness of the feature extractor but also alleviates the bias of the classifier towards head classes while reducing the training skills and overhead. We propose an efficient one-stage training strategy for long-tailed visual recognition called Global and Local Mixture Consistency cumulative learning (GLMC). Our core ideas are twofold: (1) a global and local mixture consistency loss improves the robustness of the feature extractor. Specifically, we generate two augmented batches by the global MixUp and local CutMix from the same batch data, respectively, and then use cosine similarity to minimize the difference. (2) A cumulative head tail soft label reweighted loss mitigates the head class bias problem. We use empirical class frequencies to reweight the mixed label of the head-tail class for long-tailed data and then balance the conventional loss and the rebalanced loss with a coefficient accumulated by epochs. Our approach achieves state-of-the-art accuracy on CIFAR10-LT, CIFAR100-LT, and ImageNet-LT datasets. Additional experiments on balanced ImageNet and CIFAR demonstrate that GLMC can significantly improve the generalization of backbones. Code is made publicly available at https://github.com/ynu-yangpeng/GLMC.

76.9CEApr 20
EvoMarket: A High-Fidelity and Scalable Financial Market Simulator

Muyao Zhong, Zhenhua Yang, Yuxiang Liu et al.

High-fidelity, scalable market simulation is a key instrument for mechanism evaluation, stress testing, and counterfactual policy analysis. Yet existing simulators rarely achieve \emph{mechanism fidelity} beyond single-asset intraday settings, \emph{microstructure fidelity} against historical limit order books (LOB), and \emph{computational tractability} at market scale in a single system. This paper presents \textit{EvoMarket}, a discrete-event, multi-agent financial market simulator designed for intervention-oriented experiments in multi-asset and cross-day environments. EvoMarket couples a high-throughput execution core (optimized LOB data structures, hierarchical scheduling under propagation delays, and asynchronous per-asset matching) with explicit institutional mechanisms (market calendars, opening call auctions, price limits, and T+1 settlement). To avoid expensive black-box calibration, EvoMarket introduces an Oracle-guided in-run self-calibration mechanism that interprets microstructure discrepancy as missing order flow and synthesizes corrective orders at recording checkpoints. Experiments on China A-share order-flow and LOB data show close replay alignment over five trading days, fidelity gains from budgeted in-run calibration across depth levels, broad agent order-space coverage, and scalable performance under increasing input order rates and market breadth. We further demonstrate cross-asset linkage and event-study style intervention evaluation that produces structured dependence and interpretable event-time responses.

57.1AIMay 9
SynerDiff: Synergetic Continuous Batching for Fast and Parallel Diffusion Model Inference

Ziqi Zhou, Peng Yang, Yuxin Liang et al.

The expansion of Artificial Intelligence-generated content service requires diffusion model serving to simultaneously achieve high throughput and low task end-to-end (E2E) latency. However, existing continuous batching methods suffer from severe resource contention during UNet-VAE concurrency, leading to latency spikes. Furthermore, concurrent multi-task scheduling entails a trade-off between UNet throughput and VAE latency across varying scheduling strategies. To address these, we propose SynerDiff, an efficient continuous batching system built on intra-inter level synergy. At the intra-concurrency level, SynerDiff alleviates resource contention by pruning component-specific resource bottlenecks via VAE Chunking and Adaptive Skip-CFG. At the inter-concurrency level, leveraging components' differential sensitivity to scheduling granularities, a threshold-aware scheduler plans concurrent sequences and tunes intra-concurrency decisions to minimize VAE latency while maintaining UNet within high-throughput threshold. Additionally, a feedback controller dynamically adjusts this threshold based on queue loads to boost system capacity ceiling. Experimental results show that, SynerDiff improves throughput by 1.6$\times$ and decreases both average E2E and P99 tail latencies by up to 78.7\%, compared to benchmarks while guaranteeing high image fidelity.

50.4MMMay 9
Accelerating Multi-Condition T2I Generation via Adaptive Condition Offloading and Pruning

Yuxin Kong, Peng Yang, Chongbin Yi et al.

Text-to-image (T2I) generation using multiple conditions enables fine-grained user control on the generated image. Yet, incorporating multi-condition inputs incurs substantial computation and communication overhead, due to additional preprocessing subtasks and control optimizations. It hence leads to unacceptable generation latency. In this paper, we propose an end-edge collaborative system design to accelerate multi-condition T2I generation through adaptive condition offloading and pruning. Extensive offline profiling reveal that, different conditions exhibit significant diversity in computation and communication costs. To this end, we propose a \textit{Subtask Manager} that jointly optimizes condition inference offloading and bandwidth allocation using a heuristic algorithm, balancing local and edge execution delays to minimize overall preprocessing latency. Then, we design a lightweight feature-driven \textit{Conditioning Scale Estimator} that evaluates the contribution of each condition by analyzing its feature activation strength and overlap with other conditions. This allows adaptive conditioning scale selection and pruning of insignificant conditions, thereby accelerating the denoising process. Extensive experimental results show that our system reduces latency by nearly 25\% and improves 6\% average generation quality, outperforming other benchmarks.

22.2ROMay 9
Omni-scale Learning-based Sequential Decision Framework for Order Fulfillment of Tote-handling Robotic Systems

Jiaxin Liu, Peng Yang, Yuping Li et al.

Driven by the rapid expansion of e-commerce and small-batch production, the size of the intralogistics load unit of finished goods, semi-finished goods and raw materials is steadily shrinking. Totes are gradually replacing pallets as the primary handling and storage container. This shift has propelled tote-handling robotic systems to the forefront of automation order fulfillment centers. The order-fulfillment decisions of tote-handling robotic systems share a common order-tote-robot sequential decision-making nature. Existing studies primarily focus on decision mechanisms tailored to particular systems, making it difficult to generalize or transfer them to other contexts. We propose an Omni-scale Learning-based Sequential Decision Framework for Order Fulfillment of Tote-handling Robotic Systems (OLSF-TRS), a generalized and scalable sequential decision framework that combines structured combinatorial optimization with multi-agent reinforcement learning to coordinate order,tote, and robot decisions. On small-scale tote-handling robotic systems, OLSF-TRS achieves near-optimal performance with average optimality gaps below 3.5% across two distinct system configurations. In large-scale scenarios, OLSF-TRS consistently outperforms heuristic baselines across two different system types, reducing total tote movements by 8-12% and over 30% compared to SOTA rule-based approaches, while maintaining real-time responsiveness. These improvements translate into tangible operational benefits, including cost reduction, lower energy consumption, and enhanced throughput stability. The proposed framework delivers an efficient and unified order fulfillment decision-making framework for widely deployed tote-handling robotic systems,supporting high-quality order fulfillment in both e-commerce and industrial logistics sectors.

CLDec 22, 2024
Rationale-guided Prompting for Knowledge-based Visual Question Answering

Zhongjian Hu, Peng Yang, Bing Li et al.

Recently, Large Language Models (LLMs) have been used for knowledge-based Visual Question Answering (VQA). Despite the encouraging results of previous studies, prior methods prompt LLMs to predict answers directly, neglecting intermediate thought processes. We argue that prior methods do not sufficiently activate the capacities of LLMs. We propose a framework called PLRH that Prompts LLMs with Rationale Heuristics for knowledge-based VQA. The PLRH prompts LLMs with Chain of Thought (CoT) to generate rationale heuristics, i.e., intermediate thought processes, and then leverages the rationale heuristics to inspire LLMs to predict answers. Experiments show that our approach outperforms the existing baselines by more than 2.2 and 2.1 on OK-VQA and A-OKVQA, respectively.

CVFeb 21, 2024
RNDiff: Rainfall nowcasting with Condition Diffusion Model

Xudong Ling, Chaorong Li, Fengqing Qin et al.

Diffusion models are widely used in image generation because they can generate high-quality and realistic samples. This is in contrast to generative adversarial networks (GANs) and variational autoencoders (VAEs), which have some limitations in terms of image quality.We introduce the diffusion model to the precipitation forecasting task and propose a short-term precipitation nowcasting with condition diffusion model based on historical observational data, which is referred to as SRNDiff. By incorporating an additional conditional decoder module in the denoising process, SRNDiff achieves end-to-end conditional rainfall prediction. SRNDiff is composed of two networks: a denoising network and a conditional Encoder network. The conditional network is composed of multiple independent UNet networks. These networks extract conditional feature maps at different resolutions, providing accurate conditional information that guides the diffusion model for conditional generation.SRNDiff surpasses GANs in terms of prediction accuracy, although it requires more computational resources.The SRNDiff model exhibits higher stability and efficiency during training than GANs-based approaches, and generates high-quality precipitation distribution samples that better reflect future actual precipitation conditions. This fully validates the advantages and potential of diffusion models in precipitation forecasting, providing new insights for enhancing rainfall prediction.

CVJan 21
Enhancing Text-to-Image Generation via End-Edge Collaborative Hybrid Super-Resolution

Chongbin Yi, Yuxin Liang, Ziqi Zhou et al.

Artificial Intelligence-Generated Content (AIGC) has made significant strides, with high-resolution text-to-image (T2I) generation becoming increasingly critical for improving users' Quality of Experience (QoE). Although resource-constrained edge computing adequately supports fast low-resolution T2I generations, achieving high-resolution output still faces the challenge of ensuring image fidelity at the cost of latency. To address this, we first investigate the performance of super-resolution (SR) methods for image enhancement, confirming a fundamental trade-off that lightweight learning-based SR struggles to recover fine details, while diffusion-based SR achieves higher fidelity at a substantial computational cost. Motivated by these observations, we propose an end-edge collaborative generation-enhancement framework. Upon receiving a T2I generation task, the system first generates a low-resolution image based on adaptively selected denoising steps and super-resolution scales at the edge side, which is then partitioned into patches and processed by a region-aware hybrid SR policy. This policy applies a diffusion-based SR model to foreground patches for detail recovery and a lightweight learning-based SR model to background patches for efficient upscaling, ultimately stitching the enhanced ones into the high-resolution image. Experiments show that our system reduces service latency by 33% compared with baselines while maintaining competitive image quality.

LGMay 14, 2024
Expensive Multi-Objective Bayesian Optimization Based on Diffusion Models

Bingdong Li, Zixiang Di, Yongfan Lu et al.

Multi-objective Bayesian optimization (MOBO) has shown promising performance on various expensive multi-objective optimization problems (EMOPs). However, effectively modeling complex distributions of the Pareto optimal solutions is difficult with limited function evaluations. Existing Pareto set learning algorithms may exhibit considerable instability in such expensive scenarios, leading to significant deviations between the obtained solution set and the Pareto set (PS). In this paper, we propose a novel Composite Diffusion Model based Pareto Set Learning algorithm, namely CDM-PSL, for expensive MOBO. CDM-PSL includes both unconditional and conditional diffusion model for generating high-quality samples. Besides, we introduce an information entropy based weighting method to balance different objectives of EMOPs. This method is integrated with the guiding strategy, ensuring that all the objectives are appropriately balanced and given due consideration during the optimization process; Extensive experimental results on both synthetic benchmarks and real-world problems demonstrates that our proposed algorithm attains superior performance compared with various state-of-the-art MOBO algorithms.

CVMar 13, 2025
MouseGPT: A Large-scale Vision-Language Model for Mouse Behavior Analysis

Teng Xu, Taotao Zhou, Youjia Wang et al.

Analyzing animal behavior is crucial in advancing neuroscience, yet quantifying and deciphering its intricate dynamics remains a significant challenge. Traditional machine vision approaches, despite their ability to detect spontaneous behaviors, fall short due to limited interpretability and reliance on manual labeling, which restricts the exploration of the full behavioral spectrum. Here, we introduce MouseGPT, a Vision-Language Model (VLM) that integrates visual cues with natural language to revolutionize mouse behavior analysis. Built upon our first-of-its-kind dataset - incorporating pose dynamics and open-vocabulary behavioral annotations across over 42 million frames of diverse psychiatric conditions - MouseGPT provides a novel, context-rich method for comprehensive behavior interpretation. Our holistic analysis framework enables detailed behavior profiling, clustering, and novel behavior discovery, offering deep insights without the need for labor - intensive manual annotation. Evaluations reveal that MouseGPT surpasses existing models in precision, adaptability, and descriptive richness, positioning it as a transformative tool for ethology and for unraveling complex behavioral dynamics in animal models.

LGJul 27, 2025
Learning from Expert Factors: Trajectory-level Reward Shaping for Formulaic Alpha Mining

Junjie Zhao, Chengxi Zhang, Chenkai Wang et al.

Reinforcement learning (RL) has successfully automated the complex process of mining formulaic alpha factors, for creating interpretable and profitable investment strategies. However, existing methods are hampered by the sparse rewards given the underlying Markov Decision Process. This inefficiency limits the exploration of the vast symbolic search space and destabilizes the training process. To address this, Trajectory-level Reward Shaping (TLRS), a novel reward shaping method, is proposed. TLRS provides dense, intermediate rewards by measuring the subsequence-level similarity between partially generated expressions and a set of expert-designed formulas. Furthermore, a reward centering mechanism is introduced to reduce training variance. Extensive experiments on six major Chinese and U.S. stock indices show that TLRS significantly improves the predictive power of mined factors, boosting the Rank Information Coefficient by 9.29% over existing potential-based shaping algorithms. Notably, TLRS achieves a major leap in computational efficiency by reducing its time complexity with respect to the feature dimension from linear to constant, which is a significant improvement over distance-based baselines.

CEFeb 12, 2025
Time Series Analysis in Frequency Domain: A Survey of Open Challenges, Opportunities and Benchmarks

Qianru Zhang, Yuting Sun, Honggang Wen et al.

Frequency-domain analysis has emerged as a powerful paradigm for time series analysis, offering unique advantages over traditional time-domain approaches while introducing new theoretical and practical challenges. This survey provides a comprehensive examination of spectral methods from classical Fourier analysis to modern neural operators, systematically summarizing three open challenges in current research: (1) causal structure preservation during spectral transformations, (2) uncertainty quantification in learned frequency representations, and (3) topology-aware analysis for non-Euclidean data structures. Through rigorous reviewing of over 100 studies, we develop a unified taxonomy that bridges conventional spectral techniques with cutting-edge machine learning approaches, while establishing standardized benchmarks for performance evaluation. Our work identifies key knowledge gaps in the field, particularly in geometric deep learning and quantum-enhanced spectral analysis. The survey offers practitioners a systematic framework for method selection and implementation, while charting promising directions for future research in this rapidly evolving domain.

CLDec 24, 2024
Multi-Agents Based on Large Language Models for Knowledge-based Visual Question Answering

Zhongjian Hu, Peng Yang, Bing Li et al.

Large Language Models (LLMs) have achieved impressive results in knowledge-based Visual Question Answering (VQA). However existing methods still have challenges: the inability to use external tools autonomously, and the inability to work in teams. Humans tend to know whether they need to use external tools when they encounter a new question, e.g., they tend to be able to give a direct answer to a familiar question, whereas they tend to use tools such as search engines when they encounter an unfamiliar question. In addition, humans also tend to collaborate and discuss with others to get better answers. Inspired by this, we propose the multi-agent voting framework. We design three LLM-based agents that simulate different levels of staff in a team, and assign the available tools according to the levels. Each agent provides the corresponding answer, and finally all the answers provided by the agents are voted to get the final answer. Experiments on OK-VQA and A-OKVQA show that our approach outperforms other baselines by 2.2 and 1.0, respectively.

LGMay 14, 2024
Context-aware Diversity Enhancement for Neural Multi-Objective Combinatorial Optimization

Yongfan Lu, Zixiang Di, Bingdong Li et al.

Multi-objective combinatorial optimization (MOCO) problems are prevalent in various real-world applications. Most existing neural MOCO methods rely on problem decomposition to transform an MOCO problem into a series of singe-objective combinatorial optimization (SOCO) problems and train attention models based on a single-step and deterministic greedy rollout. However, inappropriate decomposition and undesirable short-sighted behaviors of previous methods tend to induce a decline in diversity. To address the above limitation, we design a Context-aware Diversity Enhancement algorithm named CDE, which casts the neural MOCO problems as conditional sequence modeling via autoregression (node-level context awareness) and establishes a direct relationship between the mapping of preferences and diversity indicator of reward based on hypervolume expectation maximization (solution-level context awareness). Based on the solution-level context awareness, we further propose a hypervolume residual update strategy to enable the Pareto attention model to capture both local and non-local information of the Pareto set/front. The proposed CDE can effectively and efficiently grasp the context information, resulting in diversity enhancement. Experimental results on three classic MOCO problems demonstrate that our CDE outperforms several state-of-the-art baselines.

CPAug 23, 2025
Detecting Multilevel Manipulation from Limit Order Book via Cascaded Contrastive Representation Learning

Yushi Lin, Peng Yang

Trade-based manipulation (TBM) undermines the fairness and stability of financial markets drastically. Spoofing, one of the most covert and deceptive TBM strategies, exhibits complex anomaly patterns across multilevel prices, while often being simplified as a single-level manipulation. These patterns are usually concealed within the rich, hierarchical information of the Limit Order Book (LOB), which is challenging to leverage due to high dimensionality and noise. To address this, we propose a representation learning framework combining a cascaded LOB representation architecture with supervised contrastive learning. Extensive experiments demonstrate that our framework consistently improves detection performance across diverse models, with Transformer-based architectures achieving state-of-the-art results. In addition, we conduct systematic analyses and ablation studies to investigate multilevel manipulation and the contributions of key components for detection, offering broader insights into representation learning and anomaly detection for complex time series data.

LGJan 25, 2025
Hardware-Aware DNN Compression for Homogeneous Edge Devices

Kunlong Zhang, Guiying Li, Ning Lu et al.

Deploying deep neural networks (DNNs) across homogeneous edge devices (the devices with the same SKU labeled by the manufacturer) often assumes identical performance among them. However, once a device model is widely deployed, the performance of each device becomes different after a period of running. This is caused by the differences in user configurations, environmental conditions, manufacturing variances, battery degradation, etc. Existing DNN compression methods have not taken this scenario into consideration and can not guarantee good compression results in all homogeneous edge devices. To address this, we propose Homogeneous-Device Aware Pruning (HDAP), a hardware-aware DNN compression framework explicitly designed for homogeneous edge devices, aiming to achieve optimal average performance of the compressed model across all devices. To deal with the difficulty of time-consuming hardware-aware evaluations for thousands or millions of homogeneous edge devices, HDAP partitions all the devices into several device clusters, which can dramatically reduce the number of devices to evaluate and use the surrogate-based evaluation instead of hardware evaluation in real-time. Experiments on ResNet50 and MobileNetV1 with the ImageNet dataset show that HDAP consistently achieves lower average inference latency compared with state-of-the-art methods, with substantial speedup gains (e.g., 2.86 $\times$ speedup at 1.0G FLOPs for ResNet50) on the homogeneous device clusters. HDAP offers an effective solution for scalable, high-performance DNN deployment methods for homogeneous edge devices.

LGOct 20, 2025
From Observations to Parameters: Detecting Changepoint in Nonlinear Dynamics with Simulation-based Inference

Xiangbo Deng, Cheng Chen, Peng Yang

Detecting regime shifts in chaotic time series is hard because observation-space signals are entangled with intrinsic variability. We propose Parameter--Space Changepoint Detection (Param--CPD), a two--stage framework that first amortizes Bayesian inference of governing parameters with a neural posterior estimator trained by simulation-based inference, and then applies a standard CPD algorithm to the resulting parameter trajectory. On Lorenz--63 with piecewise-constant parameters, Param--CPD improves F1, reduces localization error, and lowers false positives compared to observation--space baselines. We further verify identifiability and calibration of the inferred posteriors on stationary trajectories, explaining why parameter space offers a cleaner detection signal. Robustness analyses over tolerance, window length, and noise indicate consistent gains. Our results show that operating in a physically interpretable parameter space enables accurate and interpretable changepoint detection in nonlinear dynamical systems.

LGAug 29, 2025
Learning Unified Representations from Heterogeneous Data for Robust Heart Rate Modeling

Peng Yang, Zhengdong Huang, Zicheng Xie et al.

Heart rate prediction is vital for personalized health monitoring and fitness, while it frequently faces a critical challenge when deploying in real-world: data heterogeneity. We classify it in two key dimensions: source heterogeneity from fragmented device markets with varying feature sets, and user heterogeneity reflecting distinct physiological patterns across individuals and activities. Existing methods either discard device-specific information, or fail to model user-specific differences, limiting their real-world performance. To address this, we propose a framework that learns latent representations agnostic to both heterogeneity, enabling downstream predictors to work consistently under heterogeneous data patterns. Specifically, we introduce a random feature dropout strategy to handle source heterogeneity, making the model robust to various feature sets. To manage user heterogeneity, we employ a time-aware attention module to capture long-term physiological traits and use a contrastive learning objective to build a discriminative representation space. To reflect the heterogeneous nature of real-world data, we created and publicly released a new benchmark dataset, ParroTao. Evaluations on both ParroTao and the public FitRec dataset show that our model significantly outperforms existing baselines by 17% and 15%, respectively. Furthermore, analysis of the learned representations demonstrates their strong discriminative power, and one downstream application task confirm the practical value of our model.