CVOct 26, 2022
DiffusionDB: A Large-scale Prompt Gallery Dataset for Text-to-Image Generative ModelsZijie J. Wang, Evan Montoya, David Munechika et al. · gatech, ibm-research
With recent advancements in diffusion models, users can generate high-quality images by writing text prompts in natural language. However, generating images with desired details requires proper prompts, and it is often unclear how a model reacts to different prompts or what the best prompts are. To help researchers tackle these critical challenges, we introduce DiffusionDB, the first large-scale text-to-image prompt dataset totaling 6.5TB, containing 14 million images generated by Stable Diffusion, 1.8 million unique prompts, and hyperparameters specified by real users. We analyze the syntactic and semantic characteristics of prompts. We pinpoint specific hyperparameter values and prompt styles that can lead to model errors and present evidence of potentially harmful model usage, such as the generation of misinformation. The unprecedented scale and diversity of this human-actuated dataset provide exciting research opportunities in understanding the interplay between prompts and generative models, detecting deepfakes, and designing human-AI interaction tools to help users more easily use these models. DiffusionDB is publicly available at: https://poloclub.github.io/diffusiondb.
CVOct 10, 2023
ObjectComposer: Consistent Generation of Multiple Objects Without Fine-tuningAlec Helbling, Evan Montoya, Duen Horng Chau · gatech
Recent text-to-image generative models can generate high-fidelity images from text prompts. However, these models struggle to consistently generate the same objects in different contexts with the same appearance. Consistent object generation is important to many downstream tasks like generating comic book illustrations with consistent characters and setting. Numerous approaches attempt to solve this problem by extending the vocabulary of diffusion models through fine-tuning. However, even lightweight fine-tuning approaches can be prohibitively expensive to run at scale and in real-time. We introduce a method called ObjectComposer for generating compositions of multiple objects that resemble user-specified images. Our approach is training-free, leveraging the abilities of preexisting models. We build upon the recent BLIP-Diffusion model, which can generate images of single objects specified by reference images. ObjectComposer enables the consistent generation of compositions containing multiple specific objects simultaneously, all without modifying the weights of the underlying models.
AISep 21, 2025
Shall We Play a Game? Language Models for Open-ended WargamesGlenn Matlin, Parv Mahajan, Isaac Song et al. · gatech
Wargames are simulations of conflicts in which participants' decisions influence future events. While casual wargaming can be used for entertainment or socialization, serious wargaming is used by experts to explore strategic implications of decision-making and experiential learning. In this paper, we take the position that Artificial Intelligence (AI) systems, such as Language Models (LMs), are rapidly approaching human-expert capability for strategic planning -- and will one day surpass it. Military organizations have begun using LMs to provide insights into the consequences of real-world decisions during _open-ended wargames_ which use natural language to convey actions and outcomes. We argue the ability for AI systems to influence large-scale decisions motivates additional research into the safety, interpretability, and explainability of AI in open-ended wargames. To demonstrate, we conduct a scoping literature review with a curated selection of 100 unclassified studies on AI in wargames, and construct a novel ontology of open-endedness using the creativity afforded to players, adjudicators, and the novelty provided to observers. Drawing from this body of work, we distill a set of practical recommendations and critical safety considerations for deploying AI in open-ended wargames across common domains. We conclude by presenting the community with a set of high-impact open research challenges for future work.