File size: 13,328 Bytes
22db95e e6aba66 2f7faaa 05d984a e6aba66 2f7faaa 05d984a 2f7faaa 22db95e 2f7faaa 05d984a 75691ae fbadf1a 05d984a fbadf1a 800a5ef fbadf1a 8ac347d 05d984a 2f7faaa de9a5f8 2f7faaa c7e3ab1 991a2f2 c7e3ab1 991a2f2 2f7faaa ad6c9ce 2f7faaa c7e3ab1 2f7faaa 991a2f2 e43558a 91e8a08 800a5ef e9351f5 800a5ef de9a5f8 e43558a 1bf5f7f 4506b30 1bf5f7f 4506b30 1bf5f7f f814fce 1bf5f7f f814fce f956cfe f814fce bb103b3 f814fce f956cfe e6aba66 93d15e4 e6aba66 93d15e4 e6aba66 93d15e4 |
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 |
---
language:
- en
license: mit
tags:
- physics
- cosmology
datasets:
- teknium/OpenHermes-2.5
pipeline_tag: text-generation
base_model: mistralai/Mistral-7B-v0.1
model-index:
- name: cosmosage_qa
results: []
---
# cosmosage
Cosmosage is a natural-language cosmology assistant that can answer questions about cosmology.
cosmosage_v2 first underwent continued pretraining based on thousands of papers and textbooks,
and was subsequently fine-tuned on synthetically-generated question-answer pairs. It is a full
chat model, though it excels in Q&A mode, where the model gives a single answer in response to
a single question.
The code used to generate cosmosage_v2 is available at https://github.com/tijmen/cosmosage
## Usage
After downloading cosmosage_v2, the following example code can be used to ask questions:
```python
model_path = "models/cosmosage_v2/"
import torch
from transformers import AutoModelForCausalLM, AutoTokenizer
device = torch.device("cuda")
model = AutoModelForCausalLM.from_pretrained(model_path).to(device, dtype=torch.bfloat16)
tokenizer = AutoTokenizer.from_pretrained(model_path)
def ask_cosmosage(question):
prompt = f"You are cosmosage, an AI programmed to provide excellent and detailed answers to the user's question. You are an expert cosmology assistant, able to answer questions on the cosmic microwave background, galaxy formation, large scale structure, theoretical cosmology, inflation, big bang nucleosynthesis, cosmology instrumentation, and other related topics. Please assume the user is fluent in scientific terminology. Elaborate where possible to give a complete answer. If you do not know, say you do not know.β USER: {question}β ASSISTANT:"
input_ids = tokenizer.encode(prompt, return_tensors="pt").to(device)
generated_ids = model.generate(input_ids, max_length=1024, do_sample=True, temperature=0.7, top_k=None, pad_token_id=tokenizer.eos_token_id)
generated_text = tokenizer.decode(generated_ids[0], skip_special_tokens=True)
answer = generated_text.split("ASSISTANT:")[-1]
return answer
```
## Comparison to cosmosage_v1
cosmosage_v2 is a more knowledgeable model than cosmosage_v1 due to being pretrained on the papers and
textbooks, rather than just on synthetically generated QA pairs. However, it continues to struggle with
_reliability_. While many of its answers are factually accurate, some are not. The outputs of cosmosage
(or any LLM) should not be trusted to be factual.
## Training details
cosmosage_v2 was trained on 4xA100 (80 GB) at the Center for Computational Astrophysics (CfCA), National Astronomical Observatory of Japan (NAOJ).
The following parameters were used during continued pretraining:
- learning_rate: 1e-05
- train_batch_size: 4
- max_grad_norm: 3.0
- num_devices: 4
- total_train_batch_size: 16
- optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
- lr_scheduler_type: cosine
- lr_scheduler_warmup_steps: 100
- num_epochs: 3.0
- weight_decay: 1e-04
The following hyperparameters were used during QA tuning:
- learning_rate: 2e-06
- train_batch_size: 4
- max_grad_norm: 3.0
- num_devices: 4
- total_train_batch_size: 16
- optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
- lr_scheduler_type: linear
- lr_scheduler_warmup_steps: 100
- num_epochs: 2.0
- weight_decay: 0.0
## Versions
This repository contains:
- pytorch_model.bin: standard version (bfloat16)
- model.safetensors: same as pytorch_mode.bin but in safetensors format
- gptq_model-8bit-128g.safetensors: 8-bit quantized version for inference speedup and low-VRAM GPUs
- gptq_model-4bit-128g.safetensors: 4-bit quantized version for even faster inference, lower VRAM requirements, lower quality
When using one of the quantized versions, make sure to pass the quantization configuration:
```json
{
"bits": <4 or 8 depending on the version>,
"group_size": 128,
"damp_percent": 0.01,
"desc_act": false,
"static_groups": false,
"sym": true,
"true_sequential": true,
"model_name_or_path": null,
"model_file_base_name": null
}
```
## Instruction format
cosmosage_v2 was trained with the "inst" chat template as implemented in axolotl v0.4.0. This resulted in an
unusual instruction format:
```raw
<s> {system prompt}β USER: {question}β ASSISTANT:
```
- Be sure not to leave any whitespace after the final ":"
- Note the use of the U+2581 Lower One Eighth Block Unicode Character to separate the different sections.
- The example code in the Usage section above correctly implements this format.
- cosmosage_v2 does not appear to be sensitive to minor deviations from this format.
## Example output
**User:**
> What is Digital Active Nulling?
**Assistant:**
> Digital Active Nulling (DAN) is a technique used to improve the performance of Transition Edge Sensor (TES) bolometers in millimeter-wave astrophysics. It nulls the current flowing through the SQUID input coil, which is used to read out the TES, by injecting a current at the SQUID input that is the negative of the current through the TES. This results in a significant reduction in the effective input impedance of the SQUID, which increases the dynamic range of the readout system and reduces non-linearity. In addition, DAN replaces the flux-locked loop (FLL), removing the wiring length and bandwidth limitations of the FLL. DAN is used in the Simons Array and SPT-3G experiments, and is the baseline readout technology for the upcoming LiteBIRD satellite.
**User:**
> Explain the ISW effect.
**Assistant:**
> 1. The Integrated Sachs-Wolfe (ISW) effect is a phenomenon in cosmology that describes the contribution of gravitational potential fluctuations to the Cosmic Microwave Background (CMB) anisotropy. It is named after Rainer K. Sachs and Arthur M. Wolfe, who first described the effect in their 1967 paper.
> 2. The ISW effect occurs when CMB photons travel through a time-varying gravitational potential due to the expansion of the Universe. As the photons pass through regions where the gravitational potential is changing, their energy is altered, leading to a change in their temperature.
> 3. There are two main phases in which the ISW effect can occur: the early-time effect and the late-time effect.
> 4. The early-time ISW effect, also known as the Rees-Sciama effect, occurs during the transition from the radiation-dominated era to the matter-dominated era in the early Universe. As the Universe expands and the density of radiation decreases, the gravitational potentials begin to decay, causing CMB photons to gain energy and contribute to the anisotropy.
> 5. The late-time ISW effect occurs when the Universe transitions to the current dark energy-dominated era. As the Universe expands and the dark energy starts to dominate, the gravitational potentials decay again, causing CMB photons to gain energy and contribute to the anisotropy.
> 6. The ISW effect is challenging to detect because it is a secondary anisotropy in the CMB, meaning it is superimposed on the primary anisotropies that formed during the early Universe. Additionally, the ISW effect has a similar angular scale as other CMB anisotropies, making it difficult to isolate from other sources.
> 7. The detection of the ISW effect has been achieved through cross-correlating the CMB with large-scale structure (LSS) surveys, such as galaxy surveys. These cross-correlations can reveal the ISW signal by comparing the distribution of galaxies with the CMB anisotropies.
> 8. The ISW effect is an important tool for studying the large-scale structure of the Universe, the properties of dark energy, and the evolution of cosmic structures. It also provides a way to test and constrain cosmological models, such as the ΞCDM model.
**User:**
> Explain how one would calculate the helium fraction at the surface of last scattering.
**Assistant:**
> 1. Determine the redshift of the surface of last scattering (zLS) using the CMB temperature of 2.7255 K and the current temperature of the universe T0 = 2.7255 K. The surface of last scattering is approximately 1380 K, which corresponds to a redshift of zLS = 1089.9.
> 2. Calculate the scale factor at the surface of last scattering (aLS) using the redshift. The scale factor at the surface of last scattering is approximately aLS = 1/(1 + zLS) = 1/(1 + 1089.9) β 9.143 Γ 10^-4.
> 3. Determine the Hubble parameter at the surface of last scattering (HLS). The Hubble parameter is related to the Hubble constant (H0) and the scale factor (a) by the equation H(z) = H0 * a^(-3/2). Since the radiation density is much larger than the matter density at the surface of last scattering, we can use the following approximation for the Hubble parameter: H(z) β H0 * sqrt(1 + z) β H0 * sqrt(1 + 1089.9) β 1.06 Γ 10^4 * H0.
> 4. Calculate the age of the universe at the surface of last scattering (tLS). The age of the universe can be calculated using the Hubble parameter and the scale factor: t = 1/H. At the surface of last scattering, the age of the universe is approximately tLS β 1/(1.06 Γ 10^4 * H0) β 9.4 Γ 10^-10 * H0.
> 5. Determine the neutron-to-proton ratio at the surface of last scattering (n/p ratio). The neutron-to-proton ratio can be calculated using the equation n/p = exp(-Q/kT), where Q is the neutron-proton mass difference (1.293 MeV), k is the Boltzmann constant, and T is the temperature at the surface of last scattering. At the surface of last scattering, the neutron-to-proton ratio is approximately n/p β exp(-1.293/0.801) β 0.198.
> 6. Calculate the helium fraction (Yp) at the surface of last scattering (YpLS). The helium fraction can be calculated using the following equation: Yp = 2 * n/p, where n is the number of neutrons and p is the number of protons. At the surface of last scattering, the helium fraction is approximately YpLS β 2 * 0.198 β 0.396.
> 7. Determine the helium fraction at the surface of last scattering (YpLS). The helium fraction at the surface of last scattering is approximately YpLS β 0.396.
**User:**
> How does the time of matter-radiation equality affect the damping tail?
**Assistant:**
> The time of matter-radiation equality affects the damping tail by influencing the amount of damping that occurs in the CMB power spectrum. The damping tail is the region of the CMB power spectrum that is suppressed due to the diffusion of photons in the early universe. This diffusion is caused by the interaction of photons with free electrons, which scatter the photons and reduce their energy. The amount of damping that occurs is determined by the amount of time that the photons spend in this diffusive state, which is influenced by the time of matter-radiation equality.
>
> Matter-radiation equality is the point in the history of the universe when the energy density of matter becomes equal to the energy density of radiation. This occurs when the universe is approximately 50,000 years old. The time of matter-radiation equality affects the damping tail because it determines the amount of time that photons spend in the diffusive state before they are able to decouple from the matter and travel freely through the universe.
>
> If matter-radiation equality occurs later in the history of the universe, then the photons spend more time in the diffusive state, which results in more damping of the CMB power spectrum. This is because the longer the photons spend in the diffusive state, the more time they have to lose energy through scattering with free electrons. Conversely, if matter-radiation equality occurs earlier in the history of the universe, then the photons spend less time in the diffusive state, which results in less damping of the CMB power spectrum.
>
> In summary, the time of matter-radiation equality affects the damping tail of the CMB power spectrum by influencing the amount of time that photons spend in the diffusive state before they are able to decouple from the matter and travel freely through the universe. The longer the photons spend in the diffusive state, the more damping occurs, and the earlier matter-radiation equality occurs, the less damping occurs.>
# [Open LLM Leaderboard Evaluation Results](https://huggingface.co/spaces/HuggingFaceH4/open_llm_leaderboard)
Detailed results can be found [here](https://huggingface.co/datasets/open-llm-leaderboard/details_Tijmen2__cosmosage_v2)
| Metric |Value|OpenHermes2.5-Mistral7B|
|---------------------------------|----:|----------------------:|
|Avg. |60.66|61.52|
|AI2 Reasoning Challenge (25-Shot)|59.73|64.93|
|HellaSwag (10-Shot) |80.90|84.18|
|MMLU (5-Shot) |59.57|63.64|
|TruthfulQA (0-shot) |50.98|52.24|
|Winogrande (5-shot) |75.93|78.06|
|GSM8k (5-shot) |36.85|26.08|
cosmosage_v2 can be compared to OpenHermes-2.5-Mistral-7B because it started from the same base model and also trained on the OpenHermes2.5 dataset.
cosmosage_v2 performs only slightly below OpenHermes-2.5-Mistral-7B on most metrics, indicating that the
heavy specialization in cosmology has not come at much of a cost on general-purpose abilities. The exception
is GSM8k, which is a collection of grade school math problems. Here, cosmosage performs significantly better
than OpenHermes-2.5-Mistral-7B. |