CLAICVLGApr 9, 2023

Hi Sheldon! Creating Deep Personalized Characters from TV Shows

arXiv:2304.11093v1h-index: 17
Originality Synthesis-oriented
AI Analysis

This work addresses the challenge of building interactive, personalized AI characters for entertainment or social applications, though it is incremental as it builds on existing multimodal dialogue tasks with a new dataset and baseline.

The paper tackles the problem of creating multimodal AI-generated digital characters that mimic specific TV show personalities like Sheldon from The Big Bang Theory, proposing the Deep Personalized Character Creation (DPCC) task and collecting a dataset (DPCD) with ~10k utterances and ~6 hours of audio/video per character, which is about 10 times larger than existing datasets, and demonstrates that a baseline method can generate personalized multimodal responses.

Imagine an interesting multimodal interactive scenario that you can see, hear, and chat with an AI-generated digital character, who is capable of behaving like Sheldon from The Big Bang Theory, as a DEEP copy from appearance to personality. Towards this fantastic multimodal chatting scenario, we propose a novel task, named Deep Personalized Character Creation (DPCC): creating multimodal chat personalized characters from multimodal data such as TV shows. Specifically, given a single- or multi-modality input (text, audio, video), the goal of DPCC is to generate a multi-modality (text, audio, video) response, which should be well-matched the personality of a specific character such as Sheldon, and of high quality as well. To support this novel task, we further collect a character centric multimodal dialogue dataset, named Deep Personalized Character Dataset (DPCD), from TV shows. DPCD contains character-specific multimodal dialogue data of ~10k utterances and ~6 hours of audio/video per character, which is around 10 times larger compared to existing related datasets.On DPCD, we present a baseline method for the DPCC task and create 5 Deep personalized digital Characters (DeepCharacters) from Big Bang TV Shows. We conduct both subjective and objective experiments to evaluate the multimodal response from DeepCharacters in terms of characterization and quality. The results demonstrates that, on our collected DPCD dataset, the proposed baseline can create personalized digital characters for generating multimodal response.Our collected DPCD dataset, the code of data collection and our baseline will be published soon.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes