DCJan 17, 2024
Computing in the Era of Large Generative Models: From Cloud-Native to AI-NativeYao Lu, Song Bian, Lequn Chen et al.
In this paper, we investigate the intersection of large generative AI models and cloud-native computing architectures. Recent large models such as ChatGPT, while revolutionary in their capabilities, face challenges like escalating costs and demand for high-end GPUs. Drawing analogies between large-model-as-a-service (LMaaS) and cloud database-as-a-service (DBaaS), we describe an AI-native computing paradigm that harnesses the power of both cloud-native technologies (e.g., multi-tenancy and serverless computing) and advanced machine learning runtime (e.g., batched LoRA inference). These joint efforts aim to optimize costs-of-goods-sold (COGS) and improve resource accessibility. The journey of merging these two domains is just at the beginning and we hope to stimulate future research and development in this area.
LGJun 26, 2025
DiLoCoX: A Low-Communication Large-Scale Training Framework for Decentralized ClusterJi Qi, WenPeng Zhu, Li Li et al.
The distributed training of foundation models, particularly large language models (LLMs), demands a high level of communication. Consequently, it is highly dependent on a centralized cluster with fast and reliable interconnects. Can we conduct training on slow networks and thereby unleash the power of decentralized clusters when dealing with models exceeding 100 billion parameters? In this paper, we propose DiLoCoX, a low-communication large-scale decentralized cluster training framework. It combines Pipeline Parallelism with Dual Optimizer Policy, One-Step-Delay Overlap of Communication and Local Training, and an Adaptive Gradient Compression Scheme. This combination significantly improves the scale of parameters and the speed of model pre-training. We justify the benefits of one-step-delay overlap of communication and local training, as well as the adaptive gradient compression scheme, through a theoretical analysis of convergence. Empirically, we demonstrate that DiLoCoX is capable of pre-training a 107B foundation model over a 1Gbps network. Compared to vanilla AllReduce, DiLoCoX can achieve a 357x speedup in distributed training while maintaining negligible degradation in model convergence. To the best of our knowledge, this is the first decentralized training framework successfully applied to models with over 100 billion parameters.
CRFeb 21, 2022
DECLOAK: Enable Secure and Cheap Multi-Party Transactions on Legacy Blockchains by a Minimally Trusted TEE NetworkQian Ren, Yue Li, Yingjun Wu et al.
As the confidentiality and scalability of smart contracts have become a crucial demand of blockchains, off-chain contract execution frameworks have been promising. Some have recently expanded off-chain contracts to Multi-Party Computation (MPC), which seek to transition the on-chain states by off-chain MPC. The most general problem among these solutions is MPT, since its off-chain MPC takes on- and off-chain inputs, delivers on- and off-chain outputs, and can be publicly verified by the blockchain, thus capable of covering more scenarios. However, existing Multi-Party Transaction (MPT) solutions lack at least one of data availability, financial fairness, delivery fairness, and delivery atomicity. These properties are crucially valued by communities, e.g., the Ethereum community, or users. Even worse, these solutions require high-cost interactions between the blockchain and off-chain systems. This paper proposes a novel MPT-enabled off-chain contract execution framework, DECLOAK. DECLOAK is the first to achieve data availability of MPT, and our method can apply to other fields that seek to persist user data on-chain. Moreover, DECLOAK solves all mentioned shortcomings with even lower gas costs and weaker assumptions. Specifically, DECLOAK tolerates all but one Byzantine party and TEE executors. Evaluating on 10 MPTs, DECLOAK reduces the gas cost of the SOTA, Cloak, by 65.6%. Consequently, we are the first to not only achieve such level secure MPT in practical assumption, but also demonstrate that evaluating MPT in the comparable gas cost to normal Ethereum transaction is possible. And the cost superiority of DECLOAK increases as the number of MPT parties grows.
CRJun 26, 2021
Cloak: Transitioning States on Legacy Blockchains Using Secure and Publicly Verifiable Off-Chain Multi-Party ComputationQian Ren, Yingjun Wu, Han Liu et al.
In recent years, the confidentiality of smart contracts has become a fundamental requirement for practical applications. While many efforts have been made to develop architectural capabilities for enforcing confidential smart contracts, a few works arise to extend confidential smart contracts to Multi-Party Computation (MPC), i.e., multiple parties jointly evaluate a transaction off-chain and commit the outputs on-chain without revealing their secret inputs/outputs to each other. However, existing solutions lack public verifiability and require O(n) transactions to enable negotiation or resist adversaries, thus suffering from inefficiency and compromised security. In this paper, we propose Cloak, a framework for enabling Multi-Party Transaction (MPT) on existing blockchains. An MPT refers to transitioning blockchain states by an publicly verifiable off-chain MPC. We identify and handle the challenges of securing MPT by harmonizing TEE and blockchain. Consequently, Cloak secures the off-chain nondeterministic negotiation process (a party joins an MPT without knowing identities or the total number of parties until the MPT proposal settles), achieves public verifiability (the public can validate that the MPT correctly handles the secret inputs/outputs from multiple parties and reads/writes states on-chain), and resists Byzantine adversaries. According to our proof, Cloak achieves better security with only 2 transactions, superior to previous works that achieve compromised security at O(n) transactions cost. By evaluating examples and real-world MPTs, the gas cost of Cloak reduces by 32.4% on average.