GRMar 12, 2025Code
Open-Sora 2.0: Training a Commercial-Level Video Generation Model in $200kXiangyu Peng, Zangwei Zheng, Chenhui Shen et al.
Video generation models have achieved remarkable progress in the past year. The quality of AI video continues to improve, but at the cost of larger model size, increased data quantity, and greater demand for training compute. In this report, we present Open-Sora 2.0, a commercial-level video generation model trained for only $200k. With this model, we demonstrate that the cost of training a top-performing video generation model is highly controllable. We detail all techniques that contribute to this efficiency breakthrough, including data curation, model architecture, training strategy, and system optimization. According to human evaluation results and VBench scores, Open-Sora 2.0 is comparable to global leading video generation models including the open-source HunyuanVideo and the closed-source Runway Gen-3 Alpha. By making Open-Sora 2.0 fully open-source, we aim to democratize access to advanced video generation technology, fostering broader innovation and creativity in content creation. All resources are publicly available at: https://github.com/hpcaitech/Open-Sora.
CLOct 24, 2020
New Approaches for Natural Language Understanding based on the Idea that Natural Language encodes both Information and its Processing ProceduresLimin Zhang
We must recognize that natural language is a way of information encoding, and it encodes not only the information but also the procedures for how information is processed. To understand natural language, the same as we conceive and design computer languages, the first step is to separate information (or data) and the processing procedures of information (or data). In natural language, some processing procedures of data are encoded directly as the structure chunk and the pointer chunk (this paper has reclassified lexical chunks as the data chunk, structure chunk, and the pointer chunk); some processing procedures of data imply in sentences structures; some requests of processing procedures are expressed by information senders and processed by information receivers. For the data parts, the classification encoding system of attribute information and the information organization architecture (including constitutional structures of information sets and the hierarchy between the information sets) were discussed. In section 2, the theoretical part elaborated in section 2 has been verified in examples and proofed that the studies in this paper have achieved the goal of enabling machines to understand the information conveyed in the dialogue. In section 4, the author summarizes the basic conditions of "Understanding", rethinks what "Understanding" is and how to proceed. The study in this paper provides a practical, theoretical basis and research methods for NLU. It also can be applied in large-scale and multi-type information processing in the artificial intelligence (AI) area.
ITOct 19, 2019
Convolutional Neural Networks for Space-Time Block Coding RecognitionWenjun Yan, Qing Ling, Limin Zhang
We apply the latest advances in machine learning with deep neural networks to the tasks of radio modulation recognition, channel coding recognition, and spectrum monitoring. This paper first proposes an identification algorithm for space-time block coding of a signal. The feature between spatial multiplexing and Alamouti signals is extracted by adapting convolutional neural networks after preprocessing the received sequence. Unlike other algorithms, this method requires no prior information of channel coefficients and noise power, and consequently is well-suited for noncooperative contexts. Results show that the proposed algorithm performs well even at a low signal-to-noise ratio