CVMMMay 24, 2023

Vision + Language Applications: A Survey

arXiv:2305.14598v116 citationsHas Code
Originality Synthesis-oriented
AI Analysis

It serves as a reference for researchers and practitioners interested in vision and language applications, but it is incremental as it primarily compiles existing information.

This survey paper explores the field of text-to-image generation and other multimodal applications, highlighting the limited existing literature and providing a resource for ongoing updates.

Text-to-image generation has attracted significant interest from researchers and practitioners in recent years due to its widespread and diverse applications across various industries. Despite the progress made in the domain of vision and language research, the existing literature remains relatively limited, particularly with regard to advancements and applications in this field. This paper explores a relevant research track within multimodal applications, including text, vision, audio, and others. In addition to the studies discussed in this paper, we are also committed to continually updating the latest relevant papers, datasets, application projects and corresponding information at https://github.com/Yutong-Zhou-cv/Awesome-Text-to-Image

Code Implementations1 repo
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes