5.5AIMar 18
Efficient Policy Learning with Hybrid Evaluation-Based Genetic Programming for Uncertain Agile Earth Observation Satellite SchedulingJunhua Xue, Yuning Chen, Mingyan Shao et al.
The Uncertain Agile Earth Observation Satellite Scheduling Problem (UAEOSSP) is a novel combinatorial optimization problem and a practical engineering challenge that aligns with the current demands of space technology development. It incorporates uncertainties in profit, resource consumption, and visibility, which may render pre-planned schedules suboptimal or even infeasible. Genetic Programming Hyper-Heuristic (GPHH) shows promise for evolving interpretable scheduling policies; however, their simulation-based evaluation incurs high computational costs. Moreover, the design of the constructive method, denoted as Online Scheduling Algorithm (OSA), directly affects fitness assessment, resulting in evaluation-dependent local optima within the policy space. To address these issues, this paper proposes a Hybrid Evaluation-based Genetic Programming (HE-GP) for effectively solving UAEOSSP. A Hybrid Evaluation (HE) mechanism is integrated into the policy-driven OSA, combining exact and approximate filtering modes: exact mode ensures evaluation accuracy through elaborately designed constraint verification modules, while approximate mode reduces computational overhead via simplified logic. HE-GP dynamically switches between evaluation models based on real-time evolutionary state information. Experiments on 16 simulated instance sets demonstrate that HE-GP significantly outperforms handcrafted heuristics and single-evaluation based GPHH, achieving substantial reductions in computational cost while maintaining excellent scheduling performance across diverse scenarios. Specifically, the average training time of HE-GP was reduced by 17.77\% compared to GP employing exclusively exact evaluation, while the optimal policy generated by HE-GP achieved the highest average ranks across all scenarios.
59.0NIMay 4
Choir: Tackling RTBC Performance Impossible Triangle with 5G CollaborationWenji Du, Wanghong Yang, Baosen Zhao et al.
Real-time broadband communication (RTBC) scenarios, such as cloud virtual reality and 8K live streaming, further raise the criteria of the performance triangle, requiring video bitrates exceeding 30 Mbps, tail delay below 50 ms, and fairness guarantees for multi-user concurrent access. Based on our testing and analysis, existing RTBC-oriented rate control solutions, including end-to-end algorithms and network-assisted algorithms, fail to simultaneously satisfy all performance metrics. The native dynamic delay and physical-layer resource allocation strategy inherent to the 5G radio access network (RAN) are the key reasons. These solutions lack adaptation to the 5G architecture, leading to reduced decision performance. This paper proposes Choir, an innovative collaborative solution mainly deployed on 5G base stations that deeply integrates 5G radio characteristics and video streaming traffic patterns to guide efficient sender-side rate control. Extensive simulation and testbed evaluations demonstrate Choir's significant performance in achieving high average bitrate, low tail delay, and inter-flow fairness across different 5G network scenarios.
SDFeb 27, 2025
DiffCSS: Diverse and Expressive Conversational Speech Synthesis with Diffusion ModelsWeihao wu, Zhiwei Lin, Yixuan Zhou et al.
Conversational speech synthesis (CSS) aims to synthesize both contextually appropriate and expressive speech, and considerable efforts have been made to enhance the understanding of conversational context. However, existing CSS systems are limited to deterministic prediction, overlooking the diversity of potential responses. Moreover, they rarely employ language model (LM)-based TTS backbones, limiting the naturalness and quality of synthesized speech. To address these issues, in this paper, we propose DiffCSS, an innovative CSS framework that leverages diffusion models and an LM-based TTS backbone to generate diverse, expressive, and contextually coherent speech. A diffusion-based context-aware prosody predictor is proposed to sample diverse prosody embeddings conditioned on multimodal conversational context. Then a prosody-controllable LM-based TTS backbone is developed to synthesize high-quality speech with sampled prosody embeddings. Experimental results demonstrate that the synthesized speech from DiffCSS is more diverse, contextually coherent, and expressive than existing CSS systems
DCJun 20, 2025
Speeding up Local Optimization in Vehicle Routing with Tensor-based GPU AccelerationZhenyu Lei, Jin-Kao Hao, Qinghua Wu
Local search plays a central role in many effective heuristic algorithms for the vehicle routing problem (VRP) and its variants. However, neighborhood exploration is known to be computationally expensive and time consuming, especially for large instances or problems with complex constraints. In this study, we explore a promising direction to address this challenge by introducing an original tensor-based GPU acceleration method designed to speed up the commonly used local search operators in vehicle routing. By using an attribute-based representation, the method offers broad extensibility, making it applicable to different VRP variants. Its low-coupling architecture, with intensive computations completely offloaded to the GPU, ensures seamless integration in various local search-based algorithms and frameworks, leading to significant improvements in computational efficiency and potentially improved solution quality. Through comparative experiments on benchmark instances of three routing problems, we demonstrate the substantial computational advantages of the proposed approach over traditional CPU-based implementations. We also provide a detailed analysis of the strengths and limitations of the method, providing valuable insights into its performance characteristics and identifying potential bottlenecks in practical applications. These findings contribute to a better understanding and suggest directions for future improvements.
AIMar 14, 2024
A Multi-population Integrated Approach for Capacitated Location RoutingPengfei He, Jin-Kao Hao, Qinghua Wu
The capacitated location-routing problem involves determining the depots from a set of candidate capacitated depot locations and finding the required routes from the selected depots to serve a set of customers whereas minimizing a cost function that includes the cost of opening the chosen depots, the fixed utilization cost per vehicle used, and the total cost (distance) of the routes. This paper presents a multi-population integrated framework in which a multi-depot edge assembly crossover generates promising offspring solutions from the perspective of both depot location and route edge assembly. The method includes an effective neighborhood-based local search, a feasibility-restoring procedure and a diversification-oriented mutation. Of particular interest is the multi-population scheme which organizes the population into multiple subpopulations based on depot configurations. Extensive experiments on 281 benchmark instances from the literature show that the algorithm performs remarkably well, by improving 101 best-known results (new upper bounds) and matching 84 best-known results. Additional experiments are presented to gain insight into the role of the key elements of the algorithm.
SDJul 7, 2021
Msdtron: a high-capability multi-speaker speech synthesis system for diverse data using characteristic informationQinghua Wu, Quanbo Shen, Jian Luan et al.
In multi-speaker speech synthesis, data from a number of speakers usually tend to have great diversity due to the fact that the speakers may differ largely in ages, speaking styles, emotions, and so on. It is important but challenging to improve the modeling capabilities for multi-speaker speech synthesis. To address the issue, this paper proposes a high-capability speech synthesis system, called Msdtron, in which 1) a representation of the harmonic structure of speech, called excitation spectrogram, is designed to directly guide the learning of harmonics in mel-spectrogram. 2) conditional gated LSTM (CGLSTM) is proposed to control the flow of text content information through the network by re-weighting the gates of LSTM using speaker information. The experiments show a significant reduction in reconstruction error of mel-spectrogram in the training of the multi-speaker model, and a great improvement is observed in the subjective evaluation of speaker adapted model.
ASAug 3, 2020
Exploiting Deep Sentential Context for Expressive End-to-End Speech SynthesisFengyu Yang, Shan Yang, Qinghua Wu et al.
Attention-based seq2seq text-to-speech systems, especially those use self-attention networks (SAN), have achieved state-of-art performance. But an expressive corpus with rich prosody is still challenging to model as 1) prosodic aspects, which span across different sentential granularities and mainly determine acoustic expressiveness, are difficult to quantize and label and 2) the current seq2seq framework extracts prosodic information solely from a text encoder, which is easily collapsed to an averaged expression for expressive contents. In this paper, we propose a context extractor, which is built upon SAN-based text encoder, to sufficiently exploit the sentential context over an expressive corpus for seq2seq-based TTS. Our context extractor first collects prosodic-related sentential context information from different SAN layers and then aggregates them to learn a comprehensive sentence representation to enhance the expressiveness of the final generated speech. Specifically, we investigate two methods of context aggregation: 1) direct aggregation which directly concatenates the outputs of different SAN layers, and 2) weighted aggregation which uses multi-head attention to automatically learn contributions for different SAN layers. Experiments on two expressive corpora show that our approach can produce more natural speech with much richer prosodic variations, and weighted aggregation is more superior in modeling expressivity.
AIJul 10, 2020
Solving the Clustered Traveling Salesman Problem via TSP methodsYongliang Lu, Jin-Kao Hao, Qinghua Wu
The Clustered Traveling Salesman Problem (CTSP) is a variant of the popular Traveling Salesman Problem (TSP) arising from a number of real-life applications. In this work, we explore a transformation approach that solves the CTSP by converting it to the well-studied TSP. For this purpose, we first investigate a technique to convert a CTSP instance to a TSP and then apply powerful TSP solvers (including exact and heuristic solvers) to solve the resulting TSP instance. We want to answer the following questions: How do state-of-the-art TSP solvers perform on clustered instances converted from the CTSP? Do state-of-the-art TSP solvers compete well with the best performing methods specifically designed for the CTSP? For this purpose, we present intensive computational experiments on various benchmark instances to draw conclusions.