CV MMMay 24, 2023

Vision + Language Applications: A Survey

arXiv:2305.14598v111.016 citationsHas Code

Originality Synthesis-oriented

AI Analysis

It serves as a reference for researchers and practitioners interested in vision and language applications, but it is incremental as it primarily compiles existing information.

This survey paper explores the field of text-to-image generation and other multimodal applications, highlighting the limited existing literature and providing a resource for ongoing updates.

Text-to-image generation has attracted significant interest from researchers and practitioners in recent years due to its widespread and diverse applications across various industries. Despite the progress made in the domain of vision and language research, the existing literature remains relatively limited, particularly with regard to advancements and applications in this field. This paper explores a relevant research track within multimodal applications, including text, vision, audio, and others. In addition to the studies discussed in this paper, we are also committed to continually updating the latest relevant papers, datasets, application projects and corresponding information at https://github.com/Yutong-Zhou-cv/Awesome-Text-to-Image

View on arXiv PDF Code

Similar