IM CO GA SR CL LGJan 3, 2024

AstroLLaMA-Chat: Scaling AstroLLaMA with Conversational and Diverse Datasets

Ernest Perkowski, Rui Pan, Tuan Dung Nguyen, Yuan-Sen Ting, Sandor Kruk, Tong Zhang, Charlie O'Neill, Maja Jablonska, Zechang Sun, Michael J. Smith, Huiling Liu, Kevin Schawinski

arXiv:2401.01916v210.819 citationsh-index: 26Has CodeRes Note AA

Originality Synthesis-oriented

AI Analysis

This provides an incremental enhancement for the astronomy community by offering a specialized conversational AI tool.

The authors tackled improving LLM performance in astronomy question-answering by using continual pre-training on curated astronomy corpora with a 7B-parameter LLaMA-2 model, achieving notable improvements in specialized topic comprehension, and extended it with fine-tuning on a conversational dataset to release AstroLLaMA-Chat as an open-source tool.

We explore the potential of enhancing LLM performance in astronomy-focused question-answering through targeted, continual pre-training. By employing a compact 7B-parameter LLaMA-2 model and focusing exclusively on a curated set of astronomy corpora -- comprising abstracts, introductions, and conclusions -- we achieve notable improvements in specialized topic comprehension. While general LLMs like GPT-4 excel in broader question-answering scenarios due to superior reasoning capabilities, our findings suggest that continual pre-training with limited resources can still enhance model performance on specialized topics. Additionally, we present an extension of AstroLLaMA: the fine-tuning of the 7B LLaMA model on a domain-specific conversational dataset, culminating in the release of the chat-enabled AstroLLaMA for community use. Comprehensive quantitative benchmarking is currently in progress and will be detailed in an upcoming full paper. The model, AstroLLaMA-Chat, is now available at https://huggingface.co/universeTBD, providing the first open-source conversational AI tool tailored for the astronomy community.

View on arXiv PDF

Similar