IMLGApr 11, 2025

AstroLLaVA: towards the unification of astronomical data and natural language

arXiv:2504.08583v14 citationsh-index: 9Has Code
Originality Synthesis-oriented
AI Analysis

This work addresses the need for accessible tools in astronomy research and education by providing a domain-specific vision-language model, though it is incremental as it builds on existing methods.

The authors tackled the problem of enabling natural language interaction with astronomical imagery by fine-tuning the LLaVA model on a dataset of ~30k images with captions and question-answer pairs, resulting in a model capable of answering open-ended questions about astronomical concepts.

We present AstroLLaVA, a vision language model for astronomy that enables interaction with astronomical imagery through natural dialogue. By fine-tuning the LLaVA model on a diverse dataset of $\sim$30k images with captions and question-answer pairs sourced from NASA's `Astronomy Picture of the Day', the European Southern Observatory, and the NASA/ESA Hubble Space Telescope, we create a model capable of answering open-ended questions about astronomical concepts depicted visually. Our two-stage fine-tuning process adapts the model to both image captioning and visual question answering in the astronomy domain. We demonstrate AstroLLaVA's performance on an astronomical visual question answering benchmark and release the model weights, code, and training set to encourage further open source work in this space. Finally, we suggest a roadmap towards general astronomical data alignment with pre-trained language models, and provide an open space for collaboration towards this end for interested researchers.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes