MobileVidFactory: Automatic Diffusion-Based Social Media Video Generation for Mobile Devices from Text
This addresses the need for convenient video creation tools for mobile device users, though it appears incremental as it builds on existing diffusion models.
The paper tackles the problem of automatically generating vertical mobile videos from text by presenting MobileVidFactory, a system that adapts a pretrained image diffusion model for video generation and matches audio from a database, resulting in a high-quality open-domain video generator for mobile devices.
Videos for mobile devices become the most popular access to share and acquire information recently. For the convenience of users' creation, in this paper, we present a system, namely MobileVidFactory, to automatically generate vertical mobile videos where users only need to give simple texts mainly. Our system consists of two parts: basic and customized generation. In the basic generation, we take advantage of the pretrained image diffusion model, and adapt it to a high-quality open-domain vertical video generator for mobile devices. As for the audio, by retrieving from our big database, our system matches a suitable background sound for the video. Additionally to produce customized content, our system allows users to add specified screen texts to the video for enriching visual expression, and specify texts for automatic reading with optional voices as they like.