Secure & Personalized Music-to-Video Generation via CHARCHA
This work addresses the need for secure and personalized music video generation for music listeners, offering an incremental improvement by integrating existing multimodal techniques with new security measures.
The paper tackles the problem of generating personalized music videos by developing a fully-automated pipeline that uses listeners' images and music features like lyrics and rhythm to create immersive, context-driven visuals, with results including a novel facial identity verification protocol called CHARCHA to ensure ethical use of identity.
Music is a deeply personal experience and our aim is to enhance this with a fully-automated pipeline for personalized music video generation. Our work allows listeners to not just be consumers but co-creators in the music video generation process by creating personalized, consistent and context-driven visuals based on lyrics, rhythm and emotion in the music. The pipeline combines multimodal translation and generation techniques and utilizes low-rank adaptation on listeners' images to create immersive music videos that reflect both the music and the individual. To ensure the ethical use of users' identity, we also introduce CHARCHA (patent pending), a facial identity verification protocol that protects people against unauthorized use of their face while at the same time collecting authorized images from users for personalizing their videos. This paper thus provides a secure and innovative framework for creating deeply personalized music videos.