Xiaoyang Zhao

h-index12
2papers

2 Papers

DCOct 16, 2025Code
xLLM Technical Report

Tongxuan Liu, Tao Peng, Peijun Yang et al.

We introduce xLLM, an intelligent and efficient Large Language Model (LLM) inference framework designed for high-performance, large-scale enterprise-grade serving, with deep optimizations for diverse AI accelerators. To address these challenges, xLLM builds a novel decoupled service-engine architecture. At the service layer, xLLM-Service features an intelligent scheduling module that efficiently processes multimodal requests and co-locates online and offline tasks through unified elastic scheduling to maximize cluster utilization. This module also relies on a workload-adaptive dynamic Prefill-Decode (PD) disaggregation policy and a novel Encode-Prefill-Decode (EPD) disaggregation policy designed for multimodal inputs. Furthermore, it incorporates a distributed architecture to provide global KV Cache management and robust fault-tolerant capabilities for high availability. At the engine layer, xLLM-Engine co-optimizes system and algorithm designs to fully saturate computing resources. This is achieved through comprehensive multi-layer execution pipeline optimizations, an adaptive graph mode and an xTensor memory management. xLLM-Engine also further integrates algorithmic enhancements such as optimized speculative decoding and dynamic EPLB, collectively serving to substantially boost throughput and inference efficiency. Extensive evaluations demonstrate that xLLM delivers significantly superior performance and resource efficiency. Under identical TPOT constraints, xLLM achieves throughput up to 1.7x that of MindIE and 2.2x that of vLLM-Ascend with Qwen-series models, while maintaining an average throughput of 1.7x that of MindIE with Deepseek-series models. xLLM framework is publicly available at https://github.com/jd-opensource/xllm and https://github.com/jd-opensource/xllm-service.

MMAug 26, 2018
Patch-based Contour Prior Image Denoising for Salt and Pepper Noise

Bo Fu, XiaoYang Zhao, Yi Li et al.

The salt and pepper noise brings a significant challenge to image denoising technology, i.e. how to removal the noise clearly and retain the details effectively? In this paper, we propose a patch-based contour prior denoising approach for salt and pepper noise. First, noisy image is cut into patches as basic representation unit, a discrete total variation model is designed to extract contour structures; Second, a weighted Euclidean distance is designed to search the most similar patches, then, corresponding contour stencils are extracted from these similar patches; At the last, we build filter from contour stencils in the framework of regression. Numerical results illustrate that the proposed method is competitive with the state-of-the-art methods in terms of the peak signal-to-noise (PSNR) and visual effects.