Update README.md
Browse files
README.md
CHANGED
|
@@ -10,6 +10,8 @@ language:
|
|
| 10 |
- en
|
| 11 |
pipeline_tag: text-generation
|
| 12 |
base_model: mistralai/Mistral-7B-v0.1
|
|
|
|
|
|
|
| 13 |
---
|
| 14 |
|
| 15 |
# cosmosage
|
|
@@ -55,18 +57,16 @@ textbooks, rather than just on synthetically generated QA pairs. However, it con
|
|
| 55 |
_reliability_. While many of its answers are factually accurate, some are not. The outputs of cosmosage
|
| 56 |
(or any LLM) should not be trusted to be factual.
|
| 57 |
|
| 58 |
-
### Training
|
| 59 |
|
| 60 |
-
|
|
|
|
|
|
|
| 61 |
- learning_rate: 1e-05
|
| 62 |
-
- max_grad_norm: 3.0
|
| 63 |
- train_batch_size: 4
|
| 64 |
-
-
|
| 65 |
-
- seed: 701
|
| 66 |
-
- distributed_type: multi-GPU
|
| 67 |
- num_devices: 4
|
| 68 |
- total_train_batch_size: 16
|
| 69 |
-
- total_eval_batch_size: 16
|
| 70 |
- optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
|
| 71 |
- lr_scheduler_type: cosine
|
| 72 |
- lr_scheduler_warmup_steps: 100
|
|
@@ -75,14 +75,10 @@ The following hyperparameters were used during continued pretraining:
|
|
| 75 |
|
| 76 |
The following hyperparameters were used during QA tuning:
|
| 77 |
- learning_rate: 2e-06
|
| 78 |
-
- max_grad_norm: 3.0
|
| 79 |
- train_batch_size: 4
|
| 80 |
-
-
|
| 81 |
-
- seed: 702
|
| 82 |
-
- distributed_type: multi-GPU
|
| 83 |
- num_devices: 4
|
| 84 |
- total_train_batch_size: 16
|
| 85 |
-
- total_eval_batch_size: 16
|
| 86 |
- optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
|
| 87 |
- lr_scheduler_type: linear
|
| 88 |
- lr_scheduler_warmup_steps: 100
|
|
|
|
| 10 |
- en
|
| 11 |
pipeline_tag: text-generation
|
| 12 |
base_model: mistralai/Mistral-7B-v0.1
|
| 13 |
+
datasets:
|
| 14 |
+
- teknium/OpenHermes-2.5
|
| 15 |
---
|
| 16 |
|
| 17 |
# cosmosage
|
|
|
|
| 57 |
_reliability_. While many of its answers are factually accurate, some are not. The outputs of cosmosage
|
| 58 |
(or any LLM) should not be trusted to be factual.
|
| 59 |
|
| 60 |
+
### Training details
|
| 61 |
|
| 62 |
+
cosmosage_v2 was trained on 4xA100 (80 GB) at the Center for Computational Astrophysics (CfCA), National Astronomical Observatory of Japan (NAOJ).
|
| 63 |
+
|
| 64 |
+
The following parameters were used during continued pretraining:
|
| 65 |
- learning_rate: 1e-05
|
|
|
|
| 66 |
- train_batch_size: 4
|
| 67 |
+
- max_grad_norm: 3.0
|
|
|
|
|
|
|
| 68 |
- num_devices: 4
|
| 69 |
- total_train_batch_size: 16
|
|
|
|
| 70 |
- optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
|
| 71 |
- lr_scheduler_type: cosine
|
| 72 |
- lr_scheduler_warmup_steps: 100
|
|
|
|
| 75 |
|
| 76 |
The following hyperparameters were used during QA tuning:
|
| 77 |
- learning_rate: 2e-06
|
|
|
|
| 78 |
- train_batch_size: 4
|
| 79 |
+
- max_grad_norm: 3.0
|
|
|
|
|
|
|
| 80 |
- num_devices: 4
|
| 81 |
- total_train_batch_size: 16
|
|
|
|
| 82 |
- optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
|
| 83 |
- lr_scheduler_type: linear
|
| 84 |
- lr_scheduler_warmup_steps: 100
|