File size: 3,954 Bytes
22db95e
2f7faaa
05d984a
 
2f7faaa
05d984a
2f7faaa
05d984a
 
 
 
 
c7e3ab1
 
22db95e
2f7faaa
05d984a
 
 
 
 
 
 
 
 
 
 
 
 
 
 
75691ae
 
05d984a
 
 
 
 
 
 
 
 
 
 
 
 
 
 
ba71829
8ac347d
 
05d984a
 
 
 
 
 
 
2f7faaa
c7e3ab1
2f7faaa
c7e3ab1
 
 
991a2f2
 
c7e3ab1
991a2f2
 
 
 
 
 
 
2f7faaa
ad6c9ce
2f7faaa
 
c7e3ab1
2f7faaa
 
 
 
 
991a2f2
e43558a
 
 
 
 
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
---
tags:
- physics
- cosmology
model-index:
- name: cosmosage_qa
  results: []
license: mit
language:
- en
pipeline_tag: text-generation
base_model: mistralai/Mistral-7B-v0.1
datasets:
- teknium/OpenHermes-2.5
---

# cosmosage

Cosmosage is a natural-language cosmology assistant that can answer questions about cosmology.

cosmosage_v2 first underwent continued pretraining based on thousands of papers and textbooks, 
and was subsequently fine-tuned on synthetically-generated question-answer pairs. It is a full
chat model, though it excels in Q&A mode, where the model gives a single answer in response to 
a single question.

The code used to generate cosmosage_v2 is available at https://github.com/tijmen/cosmosage

## Usage

After downloading cosmosage_v2, the following example code can be used to ask questions:

```python
path_to_model = 'cosmosage_v2/'

from transformers import AutoModelForCausalLM, AutoTokenizer
import torch
device = "cuda"
model = AutoModelForCausalLM.from_pretrained(path_to_model).to(device)
tokenizer = AutoTokenizer.from_pretrained(path_to_model)
def ask_cosmosage(question):
    input_ids = torch.cat([
    tokenizer.encode("You are cosmosage, an AI programmed to be a cosmology expert. You answer the USER's question clearly in long form, always providing context. When appropriate, provide a reference.", return_tensors="pt"),
    torch.tensor([[28705]]),
    tokenizer.encode("USER:", add_special_tokens=False, return_tensors="pt"),
    tokenizer.encode(question, add_special_tokens=False, return_tensors="pt"),
    torch.tensor([[28705]]),
    tokenizer.encode("ASSISTANT:", add_special_tokens=False, return_tensors="pt")
    ], dim=-1).to(device)
    generated_ids = model.generate(input_ids, max_length=input_ids.shape[1] + 1000, do_sample=True, temperature=0.4)
    return tokenizer.decode(generated_ids[0], skip_special_tokens=True)
```

## Comparison to cosmosage_v1

cosmosage_v2 is a more knowledgeable model than cosmosage_v1 due to being pretrained on the papers and
textbooks, rather than just on synthetically generated QA pairs. However, it continues to struggle with 
_reliability_. While many of its answers are factually accurate, some are not. The outputs of cosmosage 
(or any LLM) should not be trusted to be factual.

### Training details

cosmosage_v2 was trained on 4xA100 (80 GB) at the Center for Computational Astrophysics (CfCA), National Astronomical Observatory of Japan (NAOJ).

The following parameters were used during continued pretraining:
- learning_rate: 1e-05
- train_batch_size: 4
- max_grad_norm: 3.0
- num_devices: 4
- total_train_batch_size: 16
- optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
- lr_scheduler_type: cosine
- lr_scheduler_warmup_steps: 100
- num_epochs: 3.0
- weight_decay: 1e-04

The following hyperparameters were used during QA tuning:
- learning_rate: 2e-06
- train_batch_size: 4
- max_grad_norm: 3.0
- num_devices: 4
- total_train_batch_size: 16
- optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
- lr_scheduler_type: linear
- lr_scheduler_warmup_steps: 100
- num_epochs: 2.0
- weight_decay: 0.0

### Examples

```raw
USER: What is Digital Active Nulling?
ASSISTANT: Digital Active Nulling (DAN) is a technique used to improve the performance of Transition Edge Sensor (TES) bolometers in millimeter-wave astrophysics. It nulls the current flowing through the SQUID input coil, which is used to read out the TES, by injecting a current at the SQUID input that is the negative of the current through the TES. This results in a significant reduction in the effective input impedance of the SQUID, which increases the dynamic range of the readout system and reduces non-linearity. In addition, DAN replaces the flux-locked loop (FLL), removing the wiring length and bandwidth limitations of the FLL. DAN is used in the Simons Array and SPT-3G experiments, and is the baseline readout technology for the upcoming LiteBIRD satellite.
```