IM LGApr 11, 2025

AstroLLaVA: towards the unification of astronomical data and natural language

Sharaf Zaman, Michael J. Smith, Pranav Khetarpal, Rishabh Chakrabarty, Michele Ginolfi, Marc Huertas-Company, Maja Jabłońska, Sandor Kruk, Matthieu Le Lain, Sergio José Rodríguez Méndez, Dimitrios Tanoglidis

arXiv:2504.08583v15.14 citationsh-index: 9Has Code

Originality Synthesis-oriented

AI Analysis

This work addresses the need for accessible tools in astronomy research and education by providing a domain-specific vision-language model, though it is incremental as it builds on existing methods.

The authors tackled the problem of enabling natural language interaction with astronomical imagery by fine-tuning the LLaVA model on a dataset of ~30k images with captions and question-answer pairs, resulting in a model capable of answering open-ended questions about astronomical concepts.

We present AstroLLaVA, a vision language model for astronomy that enables interaction with astronomical imagery through natural dialogue. By fine-tuning the LLaVA model on a diverse dataset of $\sim$30k images with captions and question-answer pairs sourced from NASA's `Astronomy Picture of the Day', the European Southern Observatory, and the NASA/ESA Hubble Space Telescope, we create a model capable of answering open-ended questions about astronomical concepts depicted visually. Our two-stage fine-tuning process adapts the model to both image captioning and visual question answering in the astronomy domain. We demonstrate AstroLLaVA's performance on an astronomical visual question answering benchmark and release the model weights, code, and training set to encourage further open source work in this space. Finally, we suggest a roadmap towards general astronomical data alignment with pre-trained language models, and provide an open space for collaboration towards this end for interested researchers.

View on arXiv PDF

Similar