Generated Contents Enrichment
This addresses the challenge of generating more detailed and coherent AI-generated content for applications in media and design, though it appears incremental by building on existing scene graph and synthesis methods.
The paper tackles the problem of generating content that is both visually realistic and semantically rich by introducing Generated Contents Enrichment (GCE), which explicitly enriches textual descriptions in visual and textual domains, and demonstrates promising results on the Visual Genome dataset.
In this paper, we investigate a novel artificial intelligence generation task termed Generated Contents Enrichment (GCE). Conventional AI content generation produces visually realistic content by implicitly enriching the given textual description based on limited semantic descriptions. Unlike this traditional task, our proposed GCE strives to perform content enrichment explicitly in both the visual and textual domains. The goal is to generate content that is visually realistic, structurally coherent, and semantically abundant. To tackle GCE, we propose a deep end-to-end adversarial method that explicitly explores semantics and inter-semantic relationships during the enrichment process. Our approach first models the input description as a scene graph, where nodes represent objects and edges capture inter-object relationships. We then adopt Graph Convolutional Networks on top of the input scene description to predict additional enriching objects and their relationships with the existing ones. Finally, the enriched description is passed to an image synthesis model to generate the corresponding visual content. Experiments conducted on the Visual Genome dataset demonstrate the effectiveness of our method, producing promising and visually plausible results.