AstroMLab

AstroMLab is a diverse group of researchers dedicated to advancing the application of Large Language Models (LLMs) in astronomy. Our team includes:

Leading astronomers, astrophysicists, and cosmologists.
Natural language processing experts.
Frontier arXivists from the NASA Astrophysics Data System

Objectives

Develop specialized LLMs for astronomy
Create open-source models for advanced research
Facilitate LLM-driven end-to-end research in astronomy

Current Work

Our ongoing projects include:

Curation of an astronomy-based benchmarking dataset
Development of specialized astronomy LLMs
Performance evaluation of models on astronomical tasks

Models and Performance

We have developed several models, including AstroSage-8B, AstroLLaMA-2-70B, and AstroLLaMA-3-8B. Our AstroSage-8B model has demonstrated strong performance in astronomy Q&A tasks:

Model	Score (%)
AstroSage-8B (AstroMLab)	77.2
LLaMA-3.1-8B	73.7
AstroLLaMA-2-70B (AstroMLab)	72.3
Gemma-2-9B	71.5
Qwen-2.5-7B	70.4
Yi-1.5-9B	68.4
InternLM-2.5-7B	64.0
Mistral-7B-v0.3	63.9
ChatGLM3-6B	50.4
AstroLLaMA-2-7B (UniverseTBD)	44.3

AstroSage-8B, our lightweight model, currently achieves the highest score in this comparison.

Support and Resources

Our research benefits from:

Access to the Frontier nodes at Oak Ridge Leadership Computing Facility
Support from Microsoft's Accelerating Foundation Models Research (AFMR) program

Open Science

We are committed to open science principles. Our models are available on Hugging Face.

Contact

For inquiries or collaboration opportunities, please contact: [email protected]