File size: 3,110 Bytes
22db95e 2f7faaa 05d984a 2f7faaa 05d984a 2f7faaa 05d984a 22db95e 2f7faaa 05d984a 75691ae 05d984a 8ac347d 05d984a 2f7faaa ad6c9ce 2f7faaa 991a2f2 2f7faaa ad6c9ce 2f7faaa 991a2f2 2f7faaa 991a2f2 |
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 |
---
tags:
- physics
- cosmology
model-index:
- name: cosmosage_qa
results: []
license: mit
language:
- en
pipeline_tag: text-generation
base_model: mistralai/Mistral-7B-v0.1
---
# cosmosage
Cosmosage is a natural-language cosmology assistant that can answer questions about cosmology.
cosmosage_v2 first underwent continued pretraining based on thousands of papers and textbooks,
and was subsequently fine-tuned on synthetically-generated question-answer pairs. It is a full
chat model, though it excels in Q&A mode, where the model gives a single answer in response to
a single question.
The code used to generate cosmosage_v2 is available at https://github.com/tijmen/cosmosage
## Usage
After downloading cosmosage_v2, the following example code can be used to ask questions:
```python
path_to_model = 'cosmosage_v2/'
from transformers import AutoModelForCausalLM, AutoTokenizer
import torch
device = "cuda"
model = AutoModelForCausalLM.from_pretrained(path_to_model).to(device)
tokenizer = AutoTokenizer.from_pretrained(path_to_model)
def ask_cosmosage(question):
input_ids = torch.cat([
tokenizer.encode("You are cosmosage, an AI programmed to be a cosmology expert. You answer the USER's question clearly in long form, always providing context. When appropriate, provide a reference.", return_tensors="pt"),
torch.tensor([[28705]]),
tokenizer.encode("USER:", add_special_tokens=False, return_tensors="pt"),
tokenizer.encode(question, add_special_tokens=False, return_tensors="pt"),
torch.tensor([[28705]]),
tokenizer.encode("ASSISTANT:", add_special_tokens=False, return_tensors="pt")
], dim=-1).to(device)
generated_ids = model.generate(input_ids, max_length=input_ids.shape[1] + 1000, do_sample=True)
return tokenizer.decode(generated_ids[0], skip_special_tokens=True)
```
## Comparison to cosmosage_v1
cosmosage_v2 is a more knowledgeable model than cosmosage_v1 due to being pretrained on the papers and
textbooks, rather than just on synthetically generated QA pairs. However, it continues to struggle with
_reliability_. While many of its answers are factually accurate, some are not. The outputs of cosmosage
(or any LLM) should not be trusted to be factual.
### Training hyperparameters
The following hyperparameters were used during continued pretraining:
- learning_rate: 1e-05
- max_grad_norm: 3.0
- train_batch_size: 4
- eval_batch_size: 4
- seed: 701
- distributed_type: multi-GPU
- num_devices: 4
- total_train_batch_size: 16
- total_eval_batch_size: 16
- optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
- lr_scheduler_type: cosine
- lr_scheduler_warmup_steps: 100
- num_epochs: 3.0
- weight_decay: 1e-04
The following hyperparameters were used during QA tuning:
- learning_rate: 2e-06
- max_grad_norm: 3.0
- train_batch_size: 4
- eval_batch_size: 4
- seed: 702
- distributed_type: multi-GPU
- num_devices: 4
- total_train_batch_size: 16
- total_eval_batch_size: 16
- optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
- lr_scheduler_type: linear
- lr_scheduler_warmup_steps: 100
- num_epochs: 2.0
- weight_decay: 0.0 |