Chengyuan Li

CV
h-index27
14papers
6,677citations
Novelty40%
AI Score33

14 Papers

CLJul 15, 2024
Qwen2 Technical Report

An Yang, Baosong Yang, Binyuan Hui et al.

This report introduces the Qwen2 series, the latest addition to our large language models and large multimodal models. We release a comprehensive suite of foundational and instruction-tuned language models, encompassing a parameter range from 0.5 to 72 billion, featuring dense models and a Mixture-of-Experts model. Qwen2 surpasses most prior open-weight models, including its predecessor Qwen1.5, and exhibits competitive performance relative to proprietary models across diverse benchmarks on language understanding, generation, multilingual proficiency, coding, mathematics, and reasoning. The flagship model, Qwen2-72B, showcases remarkable performance: 84.2 on MMLU, 37.9 on GPQA, 64.6 on HumanEval, 89.5 on GSM8K, and 82.4 on BBH as a base language model. The instruction-tuned variant, Qwen2-72B-Instruct, attains 9.1 on MT-Bench, 48.1 on Arena-Hard, and 35.7 on LiveCodeBench. Moreover, Qwen2 demonstrates robust multilingual capabilities, proficient in approximately 30 languages, spanning English, Chinese, Spanish, French, German, Arabic, Russian, Korean, Japanese, Thai, Vietnamese, and more, underscoring its versatility and global reach. To foster community innovation and accessibility, we have made the Qwen2 model weights openly available on Hugging Face and ModelScope, and the supplementary materials including example code on GitHub. These platforms also include resources for quantization, fine-tuning, and deployment, facilitating a wide range of applications and research endeavors.

LGSep 30, 2024
Rotated Runtime Smooth: Training-Free Activation Smoother for accurate INT4 inference

Ke Yi, Zengke Liu, Jianwei Zhang et al.

Large language models have demonstrated promising capabilities upon scaling up parameters. However, serving large language models incurs substantial computation and memory movement costs due to their large scale. Quantization methods have been employed to reduce service costs and latency. Nevertheless, outliers in activations hinder the development of INT4 weight-activation quantization. Existing approaches separate outliers and normal values into two matrices or migrate outliers from activations to weights, suffering from high latency or accuracy degradation. Based on observing activations from large language models, outliers can be classified into channel-wise and spike outliers. In this work, we propose Rotated Runtime Smooth (RRS), a plug-and-play activation smoother for quantization, consisting of Runtime Smooth and the Rotation operation. Runtime Smooth (RS) is introduced to eliminate channel-wise outliers by smoothing activations with channel-wise maximums during runtime. The rotation operation can narrow the gap between spike outliers and normal values, alleviating the effect of victims caused by channel-wise smoothing. The proposed method outperforms the state-of-the-art method in the LLaMA and Qwen families and improves WikiText-2 perplexity from 57.33 to 6.66 for INT4 inference.

CLJan 26, 2025Code
Qwen2.5-1M Technical Report

An Yang, Bowen Yu, Chengyuan Li et al.

We introduce Qwen2.5-1M, a series of models that extend the context length to 1 million tokens. Compared to the previous 128K version, the Qwen2.5-1M series have significantly enhanced long-context capabilities through long-context pre-training and post-training. Key techniques such as long data synthesis, progressive pre-training, and multi-stage supervised fine-tuning are employed to effectively enhance long-context performance while reducing training costs. To promote the use of long-context models among a broader user base, we present and open-source our inference framework. This framework includes a length extrapolation method that can expand the model context lengths by at least four times, or even more, without additional training. To reduce inference costs, we implement a sparse attention method along with chunked prefill optimization for deployment scenarios and a sparsity refinement method to improve precision. Additionally, we detail our optimizations in the inference engine, including kernel optimization, pipeline parallelism, and scheduling optimization, which significantly enhance overall inference performance. By leveraging our inference framework, the Qwen2.5-1M models achieve a remarkable 3x to 7x prefill speedup in scenarios with 1 million tokens of context. This framework provides an efficient and powerful solution for developing applications that require long-context processing using open-source models. The Qwen2.5-1M series currently includes the open-source models Qwen2.5-7B-Instruct-1M and Qwen2.5-14B-Instruct-1M, as well as the API-accessed model Qwen2.5-Turbo. Evaluations show that Qwen2.5-1M models have been greatly improved in long-context tasks without compromising performance in short-context scenarios. Specifically, the Qwen2.5-14B-Instruct-1M model significantly outperforms GPT-4o-mini in long-context tasks and supports contexts eight times longer.

LGOct 28, 2022
A Long-term Dependent and Trustworthy Approach to Reactor Accident Prognosis based on Temporal Fusion Transformer

Chengyuan Li, Zhifang Qiu, Yugao Ma et al.

Prognosis of the reactor accident is a crucial way to ensure appropriate strategies are adopted to avoid radioactive releases. However, there is very limited research in the field of nuclear industry. In this paper, we propose a method for accident prognosis based on the Temporal Fusion Transformer (TFT) model with multi-headed self-attention and gating mechanisms. The method utilizes multiple covariates to improve prediction accuracy on the one hand, and quantile regression methods for uncertainty assessment on the other. The method proposed in this paper is applied to the prognosis after loss of coolant accidents (LOCAs) in HPR1000 reactor. Extensive experimental results show that the method surpasses novel deep learning-based prediction methods in terms of prediction accuracy and confidence. Furthermore, the interference experiments with different signal-to-noise ratios and the ablation experiments for static covariates further illustrate that the robustness comes from the ability to extract the features of static and historical covariates. In summary, this work for the first time applies the novel composite deep learning model TFT to the prognosis of key parameters after a reactor accident, and makes a positive contribution to the establishment of a more intelligent and staff-light maintenance method for reactor systems.

CLDec 19, 2024
Qwen2.5 Technical Report

Qwen, An Yang, Baosong Yang et al.

In this report, we introduce Qwen2.5, a comprehensive series of large language models (LLMs) designed to meet diverse needs. Compared to previous iterations, Qwen 2.5 has been significantly improved during both the pre-training and post-training stages. In terms of pre-training, we have scaled the high-quality pre-training datasets from the previous 7 trillion tokens to 18 trillion tokens. This provides a strong foundation for common sense, expert knowledge, and reasoning capabilities. In terms of post-training, we implement intricate supervised finetuning with over 1 million samples, as well as multistage reinforcement learning. Post-training techniques enhance human preference, and notably improve long text generation, structural data analysis, and instruction following. To handle diverse and varied use cases effectively, we present Qwen2.5 LLM series in rich sizes. Open-weight offerings include base and instruction-tuned models, with quantized versions available. In addition, for hosted solutions, the proprietary models currently include two mixture-of-experts (MoE) variants: Qwen2.5-Turbo and Qwen2.5-Plus, both available from Alibaba Cloud Model Studio. Qwen2.5 has demonstrated top-tier performance on a wide range of benchmarks evaluating language understanding, reasoning, mathematics, coding, human preference alignment, etc. Specifically, the open-weight flagship Qwen2.5-72B-Instruct outperforms a number of open and proprietary models and demonstrates competitive performance to the state-of-the-art open-weight model, Llama-3-405B-Instruct, which is around 5 times larger. Qwen2.5-Turbo and Qwen2.5-Plus offer superior cost-effectiveness while performing competitively against GPT-4o-mini and GPT-4o respectively. Additionally, as the foundation, Qwen2.5 models have been instrumental in training specialized models such as Qwen2.5-Math, Qwen2.5-Coder, QwQ, and multimodal models.

SYAug 3, 2022
Post-hoc Interpretability based Parameter Selection for Data Oriented Nuclear Reactor Accident Diagnosis System

Chengyuan Li, Meifu Li, Zhifang Qiu

During applying data-oriented diagnosis systems to distinguishing the type of and evaluating the severity of nuclear power plant initial events, it is of vital importance to decide which parameters to be used as the system input. However, although several diagnosis systems have already achieved acceptable performance in diagnosis precision and speed, hardly have the researchers discussed the method of monitoring point choosing and its layout. For this reason, redundant measuring data are used to train the diagnostic model, leading to high uncertainty of the classification, extra training time consumption, and higher probability of overfitting while training. In this study, a method of choosing thermal hydraulics parameters of a nuclear power plant is proposed, using the theory of post-hoc interpretability theory in deep learning. At the start, a novel Time-sequential Residual Convolutional Neural Network (TRES-CNN) diagnosis model is introduced to identify the position and hydrodynamic diameter of breaks in LOCA, using 38 parameters manually chosen on HPR1000 empirically. Afterwards, post-hoc interpretability methods are applied to evaluate the attributions of diagnosis model's outputs, deciding which 15 parameters to be more decisive in diagnosing LOCA details. The results show that the TRES-CNN based diagnostic model successfully predicts the position and size of breaks in LOCA via selected 15 parameters of HPR1000, with 25% of time consumption while training the model compared the process using total 38 parameters. In addition, the relative diagnostic accuracy error is within 1.5 percent compared with the model using parameters chosen empirically, which can be regarded as the same amount of diagnostic reliability.

SPAug 30, 2022
Representation Learning based and Interpretable Reactor System Diagnosis Using Denoising Padded Autoencoder

Chengyuan Li, Zhifang Qiu, Zhangrui Yan et al.

With the mass construction of Gen III nuclear reactors, it is a popular trend to use deep learning (DL) techniques for fast and effective diagnosis of possible accidents. To overcome the common problems of previous work in diagnosing reactor accidents using deep learning theory, this paper proposes a diagnostic process that ensures robustness to noisy and crippled data and is interpretable. First, a novel Denoising Padded Autoencoder (DPAE) is proposed for representation extraction of monitoring data, with representation extractor still effective on disturbed data with signal-to-noise ratios up to 25.0 and monitoring data missing up to 40.0%. Secondly, a diagnostic framework using DPAE encoder for extraction of representations followed by shallow statistical learning algorithms is proposed, and such stepwise diagnostic approach is tested on disturbed datasets with 41.8% and 80.8% higher classification and regression task evaluation metrics, in comparison with the end-to-end diagnostic approaches. Finally, a hierarchical interpretation algorithm using SHAP and feature ablation is presented to analyze the importance of the input monitoring parameters and validate the effectiveness of the high importance parameters. The outcomes of this study provide a referential method for building robust and interpretable intelligent reactor anomaly diagnosis systems in scenarios with high safety requirements.

LGMar 30, 2024
Generative AI Models for Different Steps in Architectural Design: A Literature Review

Chengyuan Li, Tianyu Zhang, Xusheng Du et al.

Recent advances in generative artificial intelligence (AI) technologies have been significantly driven by models such as generative adversarial networks (GANs), variational autoencoders (VAEs), and denoising diffusion probabilistic models (DDPMs). Although architects recognize the potential of generative AI in design, personal barriers often restrict their access to the latest technological developments, thereby causing the application of generative AI in architectural design to lag behind. Therefore, it is essential to comprehend the principles and advancements of generative AI models and analyze their relevance in architecture applications. This paper first provides an overview of generative AI technologies, with a focus on probabilistic diffusion models (DDPMs), 3D generative models, and foundation models, highlighting their recent developments and main application scenarios. Then, the paper explains how the abovementioned models could be utilized in architecture. We subdivide the architectural design process into six steps and review related research projects in each step from 2020 to the present. Lastly, this paper discusses potential future directions for applying generative AI in the architectural design steps. This research can help architects quickly understand the development and latest progress of generative AI and contribute to the further development of intelligent architecture.

CVJan 7, 2025
KAnoCLIP: Zero-Shot Anomaly Detection through Knowledge-Driven Prompt Learning and Enhanced Cross-Modal Integration

Chengyuan Li, Suyang Zhou, Jieping Kong et al.

Zero-shot anomaly detection (ZSAD) identifies anomalies without needing training samples from the target dataset, essential for scenarios with privacy concerns or limited data. Vision-language models like CLIP show potential in ZSAD but have limitations: relying on manually crafted fixed textual descriptions or anomaly prompts is time-consuming and prone to semantic ambiguity, and CLIP struggles with pixel-level anomaly segmentation, focusing more on global semantics than local details. To address these limitations, We introduce KAnoCLIP, a novel ZSAD framework that leverages vision-language models. KAnoCLIP combines general knowledge from a Large Language Model (GPT-3.5) and fine-grained, image-specific knowledge from a Visual Question Answering system (Llama3) via Knowledge-Driven Prompt Learning (KnPL). KnPL uses a knowledge-driven (KD) loss function to create learnable anomaly prompts, removing the need for fixed text prompts and enhancing generalization. KAnoCLIP includes the CLIP visual encoder with V-V attention (CLIP-VV), Bi-Directional Cross-Attention for Multi-Level Cross-Modal Interaction (Bi-CMCI), and Conv-Adapter. These components preserve local visual semantics, improve local cross-modal fusion, and align global visual features with textual information, enhancing pixel-level anomaly detection. KAnoCLIP achieves state-of-the-art performance in ZSAD across 12 industrial and medical datasets, demonstrating superior generalization compared to existing methods.

IRMay 2, 2023
Ripple Knowledge Graph Convolutional Networks For Recommendation Systems

Chen Li, Yang Cao, Ye Zhu et al.

Using knowledge graphs to assist deep learning models in making recommendation decisions has recently been proven to effectively improve the model's interpretability and accuracy. This paper introduces an end-to-end deep learning model, named RKGCN, which dynamically analyses each user's preferences and makes a recommendation of suitable items. It combines knowledge graphs on both the item side and user side to enrich their representations to maximize the utilization of the abundant information in knowledge graphs. RKGCN is able to offer more personalized and relevant recommendations in three different scenarios. The experimental results show the superior effectiveness of our model over 5 baseline models on three real-world datasets including movies, books, and music.

CVSep 4, 2021
RiWNet: A moving object instance segmentation Network being Robust in adverse Weather conditions

Chenjie Wang, Chengyuan Li, Bin Luo et al.

Segmenting each moving object instance in a scene is essential for many applications. But like many other computer vision tasks, this task performs well in optimal weather, but then adverse weather tends to fail. To be robust in weather conditions, the usual way is to train network in data of given weather pattern or to fuse multiple sensors. We focus on a new possibility, that is, to improve its resilience to weather interference through the network's structural design. First, we propose a novel FPN structure called RiWFPN with a progressive top-down interaction and attention refinement module. RiWFPN can directly replace other FPN structures to improve the robustness of the network in non-optimal weather conditions. Then we extend SOLOV2 to capture temporal information in video to learn motion information, and propose a moving object instance segmentation network with RiWFPN called RiWNet. Finally, in order to verify the effect of moving instance segmentation in different weather disturbances, we propose a VKTTI-moving dataset which is a moving instance segmentation dataset based on the VKTTI dataset, taking into account different weather scenes such as rain, fog, sunset, morning as well as overcast. The experiment proves how RiWFPN improves the network's resilience to adverse weather effects compared to other FPN structures. We compare RiWNet to several other state-of-the-art methods in some challenging datasets, and RiWNet shows better performance especially under adverse weather conditions.

CVDec 18, 2020
Object Detection based on OcSaFPN in Aerial Images with Noise

Chengyuan Li, Jun Liu, Hailong Hong et al.

Taking the deep learning-based algorithms into account has become a crucial way to boost object detection performance in aerial images. While various neural network representations have been developed, previous works are still inefficient to investigate the noise-resilient performance, especially on aerial images with noise taken by the cameras with telephoto lenses, and most of the research is concentrated in the field of denoising. Of course, denoising usually requires an additional computational burden to obtain higher quality images, while noise-resilient is more of a description of the robustness of the network itself to different noises, which is an attribute of the algorithm itself. For this reason, the work will be started by analyzing the noise-resilient performance of the neural network, and then propose two hypotheses to build a noise-resilient structure. Based on these hypotheses, we compare the noise-resilient ability of the Oct-ResNet with frequency division processing and the commonly used ResNet. In addition, previous feature pyramid networks used for aerial object detection tasks are not specifically designed for the frequency division feature maps of the Oct-ResNet, and they usually lack attention to bridging the semantic gap between diverse feature maps from different depths. On the basis of this, a novel octave convolution-based semantic attention feature pyramid network (OcSaFPN) is proposed to get higher accuracy in object detection with noise. The proposed algorithm tested on three datasets demonstrates that the proposed OcSaFPN achieves a state-of-the-art detection performance with Gaussian noise or multiplicative noise. In addition, more experiments have proved that the OcSaFPN structure can be easily added to existing algorithms, and the noise-resilient ability can be effectively improved.

CVJul 26, 2020
U2-ONet: A Two-level Nested Octave U-structure with Multiscale Attention Mechanism for Moving Instances Segmentation

Chenjie Wang, Chengyuan Li, Bin Luo

Most scenes in practical applications are dynamic scenes containing moving objects, so segmenting accurately moving objects is crucial for many computer vision applications. In order to efficiently segment out all moving objects in the scene, regardless of whether the object has a predefined semantic label, we propose a two-level nested Octave U-structure network with a multiscale attention mechanism called U2-ONet. Each stage of U2-ONet is filled with our newly designed Octave ReSidual U-block (ORSU) to enhance the ability to obtain more context information at different scales while reducing spatial redundancy of feature maps. In order to efficiently train our multi-scale deep network, we introduce a hierarchical training supervision strategy that calculates the loss at each level while adding a knowledge matching loss to keep the optimization consistency. Experimental results show that our method achieves state-of-the-art performance in several general moving objects segmentation datasets.

CVMar 10, 2020
DymSLAM:4D Dynamic Scene Reconstruction Based on Geometrical Motion Segmentation

Chenjie Wang, Bin Luo, Yun Zhang et al.

Most SLAM algorithms are based on the assumption that the scene is static. However, in practice, most scenes are dynamic which usually contains moving objects, these methods are not suitable. In this paper, we introduce DymSLAM, a dynamic stereo visual SLAM system being capable of reconstructing a 4D (3D + time) dynamic scene with rigid moving objects. The only input of DymSLAM is stereo video, and its output includes a dense map of the static environment, 3D model of the moving objects and the trajectories of the camera and the moving objects. We at first detect and match the interesting points between successive frames by using traditional SLAM methods. Then the interesting points belonging to different motion models (including ego-motion and motion models of rigid moving objects) are segmented by a multi-model fitting approach. Based on the interesting points belonging to the ego-motion, we are able to estimate the trajectory of the camera and reconstruct the static background. The interesting points belonging to the motion models of rigid moving objects are then used to estimate their relative motion models to the camera and reconstruct the 3D models of the objects. We then transform the relative motion to the trajectories of the moving objects in the global reference frame. Finally, we then fuse the 3D models of the moving objects into the 3D map of the environment by considering their motion trajectories to obtain a 4D (3D+time) sequence. DymSLAM obtains information about the dynamic objects instead of ignoring them and is suitable for unknown rigid objects. Hence, the proposed system allows the robot to be employed for high-level tasks, such as obstacle avoidance for dynamic objects. We conducted experiments in a real-world environment where both the camera and the objects were moving in a wide range.