Zongwei Li

CR
h-index40
19papers
993citations
Novelty51%
AI Score58

19 Papers

CVOct 3, 2023Code
LanguageBind: Extending Video-Language Pretraining to N-modality by Language-based Semantic Alignment

Bin Zhu, Bin Lin, Munan Ning et al.

The video-language (VL) pretraining has achieved remarkable improvement in multiple downstream tasks. However, the current VL pretraining framework is hard to extend to multiple modalities (N modalities, N>=3) beyond vision and language. We thus propose LanguageBind, taking the language as the bind across different modalities because the language modality is well-explored and contains rich semantics. Specifically, we freeze the language encoder acquired by VL pretraining, then train encoders for other modalities with contrastive learning. As a result, all modalities are mapped to a shared feature space, implementing multi-modal semantic alignment. While LanguageBind ensures that we can extend VL modalities to N modalities, we also need a high-quality dataset with alignment data pairs centered on language. We thus propose VIDAL-10M with Video, Infrared, Depth, Audio and their corresponding Language, naming as VIDAL-10M. In our VIDAL-10M, all videos are from short video platforms with complete semantics rather than truncated segments from long videos, and all the video, depth, infrared, and audio modalities are aligned to their textual descriptions. LanguageBind has achieved superior performance on a wide range of 15 benchmarks covering video, audio, depth, and infrared. Moreover, multiple experiments have provided evidence for the effectiveness of LanguageBind in achieving indirect alignment and complementarity among diverse modalities. Code address: https://github.com/PKU-YuanGroup/LanguageBind

CLMay 21, 2025
Hunyuan-TurboS: Advancing Large Language Models through Mamba-Transformer Synergy and Adaptive Chain-of-Thought

Tencent Hunyuan Team, Ao Liu, Botong Zhou et al. · tencent-ai

As Large Language Models (LLMs) rapidly advance, we introduce Hunyuan-TurboS, a novel large hybrid Transformer-Mamba Mixture of Experts (MoE) model. It synergistically combines Mamba's long-sequence processing efficiency with Transformer's superior contextual understanding. Hunyuan-TurboS features an adaptive long-short chain-of-thought (CoT) mechanism, dynamically switching between rapid responses for simple queries and deep "thinking" modes for complex problems, optimizing computational resources. Architecturally, this 56B activated (560B total) parameter model employs 128 layers (Mamba2, Attention, FFN) with an innovative AMF/MF block pattern. Faster Mamba2 ensures linear complexity, Grouped-Query Attention minimizes KV cache, and FFNs use an MoE structure. Pre-trained on 16T high-quality tokens, it supports a 256K context length and is the first industry-deployed large-scale Mamba model. Our comprehensive post-training strategy enhances capabilities via Supervised Fine-Tuning (3M instructions), a novel Adaptive Long-short CoT Fusion method, Multi-round Deliberation Learning for iterative improvement, and a two-stage Large-scale Reinforcement Learning process targeting STEM and general instruction-following. Evaluations show strong performance: overall top 7 rank on LMSYS Chatbot Arena with a score of 1356, outperforming leading models like Gemini-2.0-Flash-001 (1352) and o4-mini-2025-04-16 (1345). TurboS also achieves an average of 77.9% across 23 automated benchmarks. Hunyuan-TurboS balances high performance and efficiency, offering substantial capabilities at lower inference costs than many reasoning models, establishing a new paradigm for efficient large-scale pre-trained models.

SEMar 1Code
FastCode: Fast and Cost-Efficient Code Understanding and Reasoning

Zhonghang Li, Zongwei Li, Yuxuan Chen et al.

Repository-scale code reasoning is a cornerstone of modern AI-assisted software engineering, enabling Large Language Models (LLMs) to handle complex workflows from program comprehension to complex debugging. However, balancing accuracy with context cost remains a significant bottleneck, as existing agentic approaches often waste computational resources through inefficient, iterative full-text exploration. To address this, we introduce FastCode, a framework that decouples repository exploration from content consumption. FastCode utilizes a structural scouting mechanism to navigate a lightweight semantic-structural map of the codebase, allowing the system to trace dependencies and pinpoint relevant targets without the overhead of full-text ingestion. By leveraging structure-aware navigation tools regulated by a cost-aware policy, the framework constructs high-value contexts in a single, optimized step. Extensive evaluations on the SWE-QA, LongCodeQA, LOC-BENCH, and GitTaskBench benchmarks demonstrate that FastCode consistently outperforms state-of-the-art baselines in reasoning accuracy while significantly reducing token consumption, validating the efficiency of scouting-first strategies for large-scale code reasoning. Source code is available at https://github.com/HKUDS/FastCode.

CLNov 4, 2024Code
Hunyuan-Large: An Open-Source MoE Model with 52 Billion Activated Parameters by Tencent

Xingwu Sun, Yanfeng Chen, Yiqing Huang et al. · tencent-ai

In this paper, we introduce Hunyuan-Large, which is currently the largest open-source Transformer-based mixture of experts model, with a total of 389 billion parameters and 52 billion activation parameters, capable of handling up to 256K tokens. We conduct a thorough evaluation of Hunyuan-Large's superior performance across various benchmarks including language understanding and generation, logical reasoning, mathematical problem-solving, coding, long-context, and aggregated tasks, where it outperforms LLama3.1-70B and exhibits comparable performance when compared to the significantly larger LLama3.1-405B model. Key practice of Hunyuan-Large include large-scale synthetic data that is orders larger than in previous literature, a mixed expert routing strategy, a key-value cache compression technique, and an expert-specific learning rate strategy. Additionally, we also investigate the scaling laws and learning rate schedule of mixture of experts models, providing valuable insights and guidances for future model development and optimization. The code and checkpoints of Hunyuan-Large are released to facilitate future innovations and applications. Codes: https://github.com/Tencent/Hunyuan-Large Models: https://huggingface.co/tencent/Tencent-Hunyuan-Large

CRApr 16
NFTDELTA: Detecting Permission Control Vulnerabilities in NFT Contracts through Multi-View Learning

Hailu Kuang, Xiaoqi Li, Wenkai Li et al.

Permission control vulnerabilities in Non-fungible token (NFT) contracts can result in significant financial losses, as attackers may exploit these weaknesses to gain unauthorized access or circumvent critical permission checks. In this paper, we propose NFTDELTA, a framework that leverages static analysis and multi-view learning to detect permission control vulnerabilities in NFT contracts. Specifically, we extract comprehensive function Control Flow Graph (CFG) information via two views: sequence features (representing execution paths) and graph features (capturing structural control flow). These two views are then integrated to create a unified code representation. We also define three specific categories of permission control vulnerabilities and employ a custom detector to identify defects through multi-view feature similarity analysis. Our evaluation of 795 popular NFT collections identified 241 confirmed permission control vulnerabilities, comprising 214 cases of Bypass Auth Reentrancy, 15 of Weak Auth Validation, and 12 of Loose Permission Management. Manual verification demonstrates the detector's high reliability, achieving an average precision of 97.92% and an F1-score of 81.09%. Furthermore, NFTDELTA demonstrates enhanced efficiency and scalability, proving its effectiveness in securing NFT ecosystems.

CRApr 14
CKG-LLM: LLM-Assisted Detection of Smart Contract Access Control Vulnerabilities Based on Knowledge Graphs

Xiaoqi Li, Hailu Kuang, Wenkai Li et al.

Traditional approaches for smart contract analysis often rely on intermediate representations such as abstract syntax trees, control-flow graphs, or static single assignment form. However, these methods face limitations in capturing both semantic structures and control logic. Knowledge graphs, by contrast, offer a structured representation of entities and relations, enabling richer intermediate abstractions of contract code and supporting the use of graph query languages to identify rule-violating elements. This paper presents CKG-LLM, a framework for detecting access-control vulnerabilities in smart contracts. Leveraging the reasoning and code generation capabilities of large language models, CKG-LLM translates natural-language vulnerability patterns into executable queries over contract knowledge graphs to automatically locate vulnerable code elements. Experimental evaluation demonstrates that CKG-LLM achieves superior performance in detecting access-control vulnerabilities compared to existing tools. Finally, we discuss potential extensions of CKG-LLM as part of future research directions.

CRMar 13
Defensible Design for OpenClaw: Securing Autonomous Tool-Invoking Agents

Zongwei Li, Wenkai Li, Xiaoqi Li

OpenClaw-like agents offer substantial productivity benefits, yet they are insecure by default because they combine untrusted inputs, autonomous action, extensibility, and privileged system access within a single execution loop. We use OpenClaw as an exemplar of a broader class of agents that interact with interfaces, manipulate files, invoke tools, and install extensions in real operating environments. Consequently, their security should be treated as a software engineering problem rather than as a product-specific concern. To address these architectural vulnerabilities, we propose a blueprint for defensible design. We present a risk taxonomy, secure engineering principles, and a practical research agenda to institutionalize safety in agent construction. Our goal is to transition the community focus from isolated vulnerability patching toward systematic defensive engineering and robust deployment practices.

LGJan 4, 2025Code
DiffGraph: Heterogeneous Graph Diffusion Model

Zongwei Li, Lianghao Xia, Hua Hua et al.

Recent advances in Graph Neural Networks (GNNs) have revolutionized graph-structured data modeling, yet traditional GNNs struggle with complex heterogeneous structures prevalent in real-world scenarios. Despite progress in handling heterogeneous interactions, two fundamental challenges persist: noisy data significantly compromising embedding quality and learning performance, and existing methods' inability to capture intricate semantic transitions among heterogeneous relations, which impacts downstream predictions. To address these fundamental issues, we present the Heterogeneous Graph Diffusion Model (DiffGraph), a pioneering framework that introduces an innovative cross-view denoising strategy. This advanced approach transforms auxiliary heterogeneous data into target semantic spaces, enabling precise distillation of task-relevant information. At its core, DiffGraph features a sophisticated latent heterogeneous graph diffusion mechanism, implementing a novel forward and backward diffusion process for superior noise management. This methodology achieves simultaneous heterogeneous graph denoising and cross-type transition, while significantly simplifying graph generation through its latent-space diffusion capabilities. Through rigorous experimental validation on both public and industrial datasets, we demonstrate that DiffGraph consistently surpasses existing methods in link prediction and node classification tasks, establishing new benchmarks for robustness and efficiency in heterogeneous graph processing. The model implementation is publicly available at: https://github.com/HKUDS/DiffGraph.

SEDec 8, 2025
DeepCode: Open Agentic Coding

Zongwei Li, Zhonghang Li, Zirui Guo et al.

Recent advances in large language models (LLMs) have given rise to powerful coding agents, making it possible for code assistants to evolve into code engineers. However, existing methods still face significant challenges in achieving high-fidelity document-to-codebase synthesis--such as scientific papers to code--primarily due to a fundamental conflict between information overload and the context bottlenecks of LLMs. In this work, we introduce DeepCode, a fully autonomous framework that fundamentally addresses this challenge through principled information-flow management. By treating repository synthesis as a channel optimization problem, DeepCode seamlessly orchestrates four information operations to maximize task-relevant signals under finite context budgets: source compression via blueprint distillation, structured indexing using stateful code memory, conditional knowledge injection via retrieval-augmented generation, and closed-loop error correction. Extensive evaluations on the PaperBench benchmark demonstrate that DeepCode achieves state-of-the-art performance, decisively outperforming leading commercial agents such as Cursor and Claude Code, and crucially, surpassing PhD-level human experts from top institutes on key reproduction metrics. By systematically transforming paper specifications into production-grade implementations comparable to human expert quality, this work establishes new foundations for autonomous scientific reproduction that can accelerate research evaluation and discovery.

IRJun 1, 2024Code
RecDiff: Diffusion Model for Social Recommendation

Zongwei Li, Lianghao Xia, Chao Huang

Social recommendation has emerged as a powerful approach to enhance personalized recommendations by leveraging the social connections among users, such as following and friend relations observed in online social platforms. The fundamental assumption of social recommendation is that socially-connected users exhibit homophily in their preference patterns. This means that users connected by social ties tend to have similar tastes in user-item activities, such as rating and purchasing. However, this assumption is not always valid due to the presence of irrelevant and false social ties, which can contaminate user embeddings and adversely affect recommendation accuracy. To address this challenge, we propose a novel diffusion-based social denoising framework for recommendation (RecDiff). Our approach utilizes a simple yet effective hidden-space diffusion paradigm to alleivate the noisy effect in the compressed and dense representation space. By performing multi-step noise diffusion and removal, RecDiff possesses a robust ability to identify and eliminate noise from the encoded user representations, even when the noise levels vary. The diffusion module is optimized in a downstream task-aware manner, thereby maximizing its ability to enhance the recommendation process. We conducted extensive experiments to evaluate the efficacy of our framework, and the results demonstrate its superiority in terms of recommendation accuracy, training efficiency, and denoising effectiveness. The source code for the model implementation is publicly available at: https://github.com/HKUDS/RecDiff.

CLFeb 4, 2025
SCALM: Detecting Bad Practices in Smart Contracts Through LLMs

Zongwei Li, Xiaoqi Li, Wenkai Li et al.

As the Ethereum platform continues to mature and gain widespread usage, it is crucial to maintain high standards of smart contract writing practices. While bad practices in smart contracts may not directly lead to security issues, they do elevate the risk of encountering problems. Therefore, to understand and avoid these bad practices, this paper introduces the first systematic study of bad practices in smart contracts, delving into over 35 specific issues. Specifically, we propose a large language models (LLMs)-based framework, SCALM. It combines Step-Back Prompting and Retrieval-Augmented Generation (RAG) to identify and address various bad practices effectively. Our extensive experiments using multiple LLMs and datasets have shown that SCALM outperforms existing tools in detecting bad practices in smart contracts.

SEApr 1
SCPatcher: Automated Smart Contract Code Repair via Retrieval-Augmented Generation and Knowledge Graph

Xiaoqi Li, Shipeng Ye, Wenkai Li et al.

Smart contract vulnerabilities can cause substantial financial losses due to the immutability of code after deployment. While existing tools detect vulnerabilities, they cannot effectively repair them. In this paper, we propose SCPatcher, a framework that combines retrieval-augmented generation with a knowledge graph for automated smart contract repair. We construct a knowledge graph from 5,000 verified Ethereum contracts, extracting function-level relationships to build a semantic network. This graph serves as an external knowledge base that enhances Large Language Model reasoning and enables precise vulnerability patching. We introduce a two-stage repair strategy, initial knowledge-guided repair followed by Chain-of-Thought reasoning for complex vulnerabilities. Evaluated on a diverse set of vulnerable contracts, SCPatcher achieves 81.5\% overall repair rate and 91.0\% compilation pass rate, substantially outperforming existing methods.

CRApr 8
PSR2: A Phase-based Semantic Reasoning Framework for Atomicity Violation Detection via Contract Refinement

Xiaoqi Li, Xin Wang, Wenkai Li et al.

With the rapid advancement of decentralized applications, smart contract security faces severe challenges, particularly regarding atomicity violations in complex logic such as Oracle and NFT contracts. Rigid rule sets often limit traditional static analyzers and lack deep contextual awareness, leading to high false-positive and false-negative rates when identifying vulnerabilities that depend on intermediate state inconsistencies. To address these limitations, this paper proposes PSR\textsuperscript{2}, a novel collaborative static analysis framework that integrates structural path searching with deterministic semantic reasoning. PSR\textsuperscript{2} utilizes a Graph Structure Analysis Module (GSAM) to identify suspicious execution sequences in control flow graphs and a Semantic Context Analysis Module (SCAM) to extract data dependencies and state facts from abstract syntax trees. A Fusion Decision Module (FDM) then performs formal cross validation to confirm vulnerabilities based on a unified atomicity inconsistency model. Experimental results on 1,600 contract samples demonstrate that PSR\textsuperscript{2} significantly outperforms pattern-matching baselines, achieving an F1-score of 94.69\% in complex ERC-721 scenarios compared to 51.86\% for existing tools. Ablation studies further confirm that our fusion logic effectively reduces the false-positive rate by nearly half compared to single module analysis.

CRApr 4
LiquiLM: Bridging the Semantic Gap in Liquidity Flaw Audit via DCN and LLMs

Zekai Liu, Xiaoqi Li, Wenkai Li et al.

Traditional consensus mechanisms, such as Proof of Stake (PoS), increasingly reveal an excessive dependency on large liquidity providers. Although the Proof of Liquidity (PoL) mechanism serves as a critical paradigm for incentivizing sustained liquidity provision and ensuring market stability, its transition from asset staking to active liquidity management significantly increases the complexity of underlying smart contract economic models and interaction logic. This renders hidden liquidity logic flaws difficult to detect via traditional methods, seriously threatening the system stability and user asset security of mainstream DeFi and emerging PoL ecosystems. To address this, we propose the LiquiLM framework, which integrates Large Language Models (LLMs) with a Dynamic Co-Attention Network (DCN). By establishing a dynamic interaction between liquidity-critical contracts and flaw descriptions, the framework effectively bridges the semantic gap between underlying code implementations and high-level liquidity intents. We evaluate the performance of LiquiLM on 1,490 validation contracts (covering precision, recall, specificity, and F1-score). The results show that it achieves significant effectiveness in auditing and explaining liquidity flaws: in experiments using Gemini 3 Pro and GPT-4o as backbone models, respectively, the F1-scores both exceed 90%. Furthermore, through an in-depth audit of 1,380 real-world PoL and Ethereum economic contracts, LiquiLM successfully identifies 238 high-risk contracts and assists in discovering 10 vulnerabilities that have received CVE certification.

SEApr 1
LibScan: Smart Contract Library Misuse Detection with Iterative Feedback and Static Verification

Yishun Wang, Wenkai Li, Xiaoqi Li et al.

Smart contracts are self-executing programs that manage financial transactions on blockchain networks. Developers commonly rely on third-party code libraries to improve both efficiency and security. However, improper use of these libraries can introduce hidden vulnerabilities that are difficult to detect, leading to significant financial losses. Existing automated tools struggle to identify such misuse because it often requires understanding the developer's intent rather than simply scanning for known code patterns. This paper presents LibScan, an automated detection framework that combines large language model (LLM)-based semantic reasoning with rule-based code analysis, identifying eight distinct categories of library misuse in smart contracts. To improve detection reliability, the framework incorporates an iterative self-correction mechanism that refines its analysis across multiple rounds, alongside a structured knowledge base derived from large-scale empirical studies of real-world misuse cases. Experiments conducted on 662 real-world smart contracts demonstrate that LibScan achieves an overall detection accuracy of 85.15\%, outperforming existing tools by a margin of over 16 percentage points. Ablation experiments further confirm that combining both analysis approaches yields substantially better results than either method used independently.

CVMay 17, 2025
Facial Recognition Leveraging Generative Adversarial Networks

Zhongwen Li, Zongwei Li, Xiaoqi Li

Face recognition performance based on deep learning heavily relies on large-scale training data, which is often difficult to acquire in practical applications. To address this challenge, this paper proposes a GAN-based data augmentation method with three key contributions: (1) a residual-embedded generator to alleviate gradient vanishing/exploding problems, (2) an Inception ResNet-V1 based FaceNet discriminator for improved adversarial training, and (3) an end-to-end framework that jointly optimizes data generation and recognition performance. Experimental results demonstrate that our approach achieves stable training dynamics and significantly improves face recognition accuracy by 12.7% on the LFW benchmark compared to baseline methods, while maintaining good generalization capability with limited training samples.

CRMay 6, 2023
An Overview of AI and Blockchain Integration for Privacy-Preserving

Zongwei Li, Dechao Kong, Yuanzheng Niu et al.

With the widespread attention and application of artificial intelligence (AI) and blockchain technologies, privacy protection techniques arising from their integration are of notable significance. In addition to protecting privacy of individuals, these techniques also guarantee security and dependability of data. This paper initially presents an overview of AI and blockchain, summarizing their combination along with derived privacy protection technologies. It then explores specific application scenarios in data encryption, de-identification, multi-tier distributed ledgers, and k-anonymity methods. Moreover, the paper evaluates five critical aspects of AI-blockchain-integration privacy protection systems, including authorization management, access control, data protection, network security, and scalability. Furthermore, it analyzes the deficiencies and their actual cause, offering corresponding suggestions. This research also classifies and summarizes privacy protection techniques based on AI-blockchain application scenarios and technical schemes. In conclusion, this paper outlines the future directions of privacy protection technologies emerging from AI and blockchain integration, including enhancing efficiency and security to achieve a more comprehensive privacy protection of privacy.

COMP-PHMar 14, 2021
A Modified Batch Intrinsic Plasticity Method for Pre-training the Random Coefficients of Extreme Learning Machines

Suchuan Dong, Zongwei Li

In extreme learning machines (ELM) the hidden-layer coefficients are randomly set and fixed, while the output-layer coefficients of the neural network are computed by a least squares method. The randomly-assigned coefficients in ELM are known to influence its performance and accuracy significantly. In this paper we present a modified batch intrinsic plasticity (modBIP) method for pre-training the random coefficients in the ELM neural networks. The current method is devised based on the same principle as the batch intrinsic plasticity (BIP) method, namely, by enhancing the information transmission in every node of the neural network. It differs from BIP in two prominent aspects. First, modBIP does not involve the activation function in its algorithm, and it can be applied with any activation function in the neural network. In contrast, BIP employs the inverse of the activation function in its construction, and requires the activation function to be invertible (or monotonic). The modBIP method can work with the often-used non-monotonic activation functions (e.g. Gaussian, swish, Gaussian error linear unit, and radial-basis type functions), with which BIP breaks down. Second, modBIP generates target samples on random intervals with a minimum size, which leads to highly accurate computation results when combined with ELM. The combined ELM/modBIP method is markedly more accurate than ELM/BIP in numerical simulations. Ample numerical experiments are presented with shallow and deep neural networks for function approximation and boundary/initial value problems with partial differential equations. They demonstrate that the combined ELM/modBIP method produces highly accurate simulation results, and that its accuracy is insensitive to the random-coefficient initializations in the neural network. This is in sharp contrast with the ELM results without pre-training of the random coefficients.

NADec 4, 2020
Local Extreme Learning Machines and Domain Decomposition for Solving Linear and Nonlinear Partial Differential Equations

Suchuan Dong, Zongwei Li

We present a neural network-based method for solving linear and nonlinear partial differential equations, by combining the ideas of extreme learning machines (ELM), domain decomposition and local neural networks. The field solution on each sub-domain is represented by a local feed-forward neural network, and $C^k$ continuity is imposed on the sub-domain boundaries. Each local neural network consists of a small number of hidden layers, while its last hidden layer can be wide. The weight/bias coefficients in all hidden layers of the local neural networks are pre-set to random values and are fixed, and only the weight coefficients in the output layers are training parameters. The overall neural network is trained by a linear or nonlinear least squares computation, not by the back-propagation type algorithms. We introduce a block time-marching scheme together with the presented method for long-time dynamic simulations. The current method exhibits a clear sense of convergence with respect to the degrees of freedom in the neural network. Its numerical errors typically decrease exponentially or nearly exponentially as the number of degrees of freedom increases. Extensive numerical experiments have been performed to demonstrate the computational performance of the presented method. We compare the current method with the deep Galerkin method (DGM) and the physics-informed neural network (PINN) in terms of the accuracy and computational cost. The current method exhibits a clear superiority, with its numerical errors and network training time considerably smaller (typically by orders of magnitude) than those of DGM and PINN. We also compare the current method with the classical finite element method (FEM). The computational performance of the current method is on par with, and oftentimes exceeds, the FEM performance.