|
--- |
|
tags: |
|
- physics |
|
- cosmology |
|
model-index: |
|
- name: cosmosage_qa |
|
results: [] |
|
license: mit |
|
language: |
|
- en |
|
pipeline_tag: text-generation |
|
base_model: mistralai/Mistral-7B-v0.1 |
|
--- |
|
|
|
# cosmosage |
|
|
|
Cosmosage is a natural-language cosmology assistant that can answer questions about cosmology. |
|
|
|
cosmosage_v2 first underwent continued pretraining based on thousands of papers and textbooks, |
|
and was subsequently fine-tuned on synthetically-generated question-answer pairs. It is a full |
|
chat model, though it excels in Q&A mode, where the model gives a single answer in response to |
|
a single question. |
|
|
|
The code used to generate cosmosage_v2 is available at https://github.com/tijmen/cosmosage |
|
|
|
## Usage |
|
|
|
After downloading cosmosage_v2, the following example code can be used to ask questions: |
|
|
|
```python |
|
path_to_model = 'cosmosage_v2/' |
|
|
|
from transformers import AutoModelForCausalLM, AutoTokenizer |
|
import torch |
|
device = "cuda" |
|
model = AutoModelForCausalLM.from_pretrained(path_to_model).to(device) |
|
tokenizer = AutoTokenizer.from_pretrained(path_to_model) |
|
def ask_cosmosage(question): |
|
input_ids = torch.cat([ |
|
tokenizer.encode("You are cosmosage, an AI programmed to be a cosmology expert. You answer the USER's question clearly in long form, always providing context. When appropriate, provide a reference.", return_tensors="pt"), |
|
torch.tensor([[28705]]), |
|
tokenizer.encode("USER:", add_special_tokens=False, return_tensors="pt"), |
|
tokenizer.encode(question, add_special_tokens=False, return_tensors="pt"), |
|
torch.tensor([[28705]]), |
|
tokenizer.encode("ASSISTANT:", add_special_tokens=False, return_tensors="pt") |
|
], dim=-1).to(device) |
|
generated_ids = model.generate(input_ids, max_length=input_ids.shape[1] + 1000, do_sample=True, temperature=0.4) |
|
return tokenizer.decode(generated_ids[0], skip_special_tokens=True) |
|
``` |
|
|
|
## Comparison to cosmosage_v1 |
|
|
|
cosmosage_v2 is a more knowledgeable model than cosmosage_v1 due to being pretrained on the papers and |
|
textbooks, rather than just on synthetically generated QA pairs. However, it continues to struggle with |
|
_reliability_. While many of its answers are factually accurate, some are not. The outputs of cosmosage |
|
(or any LLM) should not be trusted to be factual. |
|
|
|
### Training hyperparameters |
|
|
|
The following hyperparameters were used during continued pretraining: |
|
- learning_rate: 1e-05 |
|
- max_grad_norm: 3.0 |
|
- train_batch_size: 4 |
|
- eval_batch_size: 4 |
|
- seed: 701 |
|
- distributed_type: multi-GPU |
|
- num_devices: 4 |
|
- total_train_batch_size: 16 |
|
- total_eval_batch_size: 16 |
|
- optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08 |
|
- lr_scheduler_type: cosine |
|
- lr_scheduler_warmup_steps: 100 |
|
- num_epochs: 3.0 |
|
- weight_decay: 1e-04 |
|
|
|
The following hyperparameters were used during QA tuning: |
|
- learning_rate: 2e-06 |
|
- max_grad_norm: 3.0 |
|
- train_batch_size: 4 |
|
- eval_batch_size: 4 |
|
- seed: 702 |
|
- distributed_type: multi-GPU |
|
- num_devices: 4 |
|
- total_train_batch_size: 16 |
|
- total_eval_batch_size: 16 |
|
- optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08 |
|
- lr_scheduler_type: linear |
|
- lr_scheduler_warmup_steps: 100 |
|
- num_epochs: 2.0 |
|
- weight_decay: 0.0 |