Structural and Functional Decomposition for Personality Image Captioning in a Communication Game
This work addresses the problem of generating image captions that reflect specific personality traits, which is important for applications requiring more nuanced and human-like AI communication.
This paper introduces a communication game framework for personality image captioning, where a speaker generates captions and a listener ensures they are discriminative for images and personality traits. By adapting GPT2 for caption generation, their model achieves state-of-the-art performance in this task.
Personality image captioning (PIC) aims to describe an image with a natural language caption given a personality trait. In this work, we introduce a novel formulation for PIC based on a communication game between a speaker and a listener. The speaker attempts to generate natural language captions while the listener encourages the generated captions to contain discriminative information about the input images and personality traits. In this way, we expect that the generated captions can be improved to naturally represent the images and express the traits. In addition, we propose to adapt the language model GPT2 to perform caption generation for PIC. This enables the speaker and listener to benefit from the language encoding capacity of GPT2. Our experiments show that the proposed model achieves the state-of-the-art performance for PIC.