CVAIHCLGMMNov 15, 2024

Instruction-Guided Editing Controls for Images and Multimedia: A Survey in LLM era

arXiv:2411.09955v25 citationsh-index: 34Has Code
Originality Synthesis-oriented
AI Analysis

It addresses the problem of limited accessibility in visual editing for non-experts by providing a comprehensive overview, though it is incremental as a survey rather than introducing new methods.

This survey synthesizes over 100 publications on instruction-guided editing techniques for images and multimedia, highlighting how large language models and multimodal learning enable intuitive, natural language-based control to democratize visual editing across domains like fashion and 3D scene manipulation.

The rapid advancement of large language models (LLMs) and multimodal learning has transformed digital content creation and manipulation. Traditional visual editing tools require significant expertise, limiting accessibility. Recent strides in instruction-based editing have enabled intuitive interaction with visual content, using natural language as a bridge between user intent and complex editing operations. This survey provides an overview of these techniques, focusing on how LLMs and multimodal models empower users to achieve precise visual modifications without deep technical knowledge. By synthesizing over 100 publications, we explore methods from generative adversarial networks to diffusion models, examining multimodal integration for fine-grained content control. We discuss practical applications across domains such as fashion, 3D scene manipulation, and video synthesis, highlighting increased accessibility and alignment with human intuition. Our survey compares existing literature, emphasizing LLM-empowered editing, and identifies key challenges to stimulate further research. We aim to democratize powerful visual editing across various industries, from entertainment to education. Interested readers are encouraged to access our repository at https://github.com/tamlhp/awesome-instruction-editing.

Code Implementations1 repo
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes