metadata

tags:
  - physics
  - cosmology
model-index:
  - name: cosmosage_qa
    results: []
license: mit
language:
  - en
pipeline_tag: text-generation
base_model: mistralai/Mistral-7B-v0.1
datasets:
  - teknium/OpenHermes-2.5

cosmosage

Cosmosage is a natural-language cosmology assistant that can answer questions about cosmology.

cosmosage_v2 first underwent continued pretraining based on thousands of papers and textbooks, and was subsequently fine-tuned on synthetically-generated question-answer pairs. It is a full chat model, though it excels in Q&A mode, where the model gives a single answer in response to a single question.

The code used to generate cosmosage_v2 is available at https://github.com/tijmen/cosmosage

Usage

After downloading cosmosage_v2, the following example code can be used to ask questions:

path_to_model = 'cosmosage_v2/'

from transformers import AutoModelForCausalLM, AutoTokenizer
import torch
device = "cuda"
model = AutoModelForCausalLM.from_pretrained(path_to_model).to(device)
tokenizer = AutoTokenizer.from_pretrained(path_to_model)
def ask_cosmosage(question):
    input_ids = torch.cat([
    tokenizer.encode("You are cosmosage, an AI programmed to be a cosmology expert. You answer the USER's question clearly in long form, always providing context. When appropriate, provide a reference.", return_tensors="pt"),
    torch.tensor([[28705]]),
    tokenizer.encode("USER:", add_special_tokens=False, return_tensors="pt"),
    tokenizer.encode(question, add_special_tokens=False, return_tensors="pt"),
    torch.tensor([[28705]]),
    tokenizer.encode("ASSISTANT:", add_special_tokens=False, return_tensors="pt")
    ], dim=-1).to(device)
    generated_ids = model.generate(input_ids, max_length=input_ids.shape[1] + 1000, do_sample=True, temperature=0.4)
    return tokenizer.decode(generated_ids[0], skip_special_tokens=True)

Comparison to cosmosage_v1

cosmosage_v2 is a more knowledgeable model than cosmosage_v1 due to being pretrained on the papers and textbooks, rather than just on synthetically generated QA pairs. However, it continues to struggle with reliability. While many of its answers are factually accurate, some are not. The outputs of cosmosage (or any LLM) should not be trusted to be factual.

Training details

cosmosage_v2 was trained on 4xA100 (80 GB) at the Center for Computational Astrophysics (CfCA), National Astronomical Observatory of Japan (NAOJ).

The following parameters were used during continued pretraining:

learning_rate: 1e-05
train_batch_size: 4
max_grad_norm: 3.0
num_devices: 4
total_train_batch_size: 16
optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
lr_scheduler_type: cosine
lr_scheduler_warmup_steps: 100
num_epochs: 3.0
weight_decay: 1e-04

The following hyperparameters were used during QA tuning:

learning_rate: 2e-06
train_batch_size: 4
max_grad_norm: 3.0
num_devices: 4
total_train_batch_size: 16
optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
lr_scheduler_type: linear
lr_scheduler_warmup_steps: 100
num_epochs: 2.0
weight_decay: 0.0

Examples

USER: What is Digital Active Nulling?
ASSISTANT: Digital Active Nulling (DAN) is a technique used to improve the performance of Transition Edge Sensor (TES) bolometers in millimeter-wave astrophysics. It nulls the current flowing through the SQUID input coil, which is used to read out the TES, by injecting a current at the SQUID input that is the negative of the current through the TES. This results in a significant reduction in the effective input impedance of the SQUID, which increases the dynamic range of the readout system and reduces non-linearity. In addition, DAN replaces the flux-locked loop (FLL), removing the wiring length and bandwidth limitations of the FLL. DAN is used in the Simons Array and SPT-3G experiments, and is the baseline readout technology for the upcoming LiteBIRD satellite.