LGCLCVAug 8, 2023

The Five-Dollar Model: Generating Game Maps and Sprites from Sentence Embeddings

arXiv:2308.04052v122 citationsh-index: 10
Originality Synthesis-oriented
AI Analysis

This work addresses the challenge of lightweight text-to-image generation for specific low-dimensional domains, though it appears incremental in its approach.

The authors tackled the problem of generating low-dimensional images from text prompts with limited training data, achieving accurate and aesthetically pleasing results in domains like pixel art game maps and sprites, as evaluated by CLIP cosine similarity scores.

The five-dollar model is a lightweight text-to-image generative architecture that generates low dimensional images from an encoded text prompt. This model can successfully generate accurate and aesthetically pleasing content in low dimensional domains, with limited amounts of training data. Despite the small size of both the model and datasets, the generated images are still able to maintain the encoded semantic meaning of the textual prompt. We apply this model to three small datasets: pixel art video game maps, video game sprite images, and down-scaled emoji images and apply novel augmentation strategies to improve the performance of our model on these limited datasets. We evaluate our models performance using cosine similarity score between text-image pairs generated by the CLIP VIT-B/32 model.

Code Implementations1 repo
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes