AI-Sweden-Models
/

Llama-3-8B

Text Generation

text-generation-inference

Inference Endpoints

Model card Files Files and versions Community

timpal0l commited on Jun 16, 2024

Commit

2ac5533

·

verified ·

1 Parent(s): b65cbf8

Update README.md

Files changed (1) hide show

README.md +32 -1

README.md CHANGED Viewed

@@ -61,7 +61,7 @@ It was trained on a subset from [The Nordic Pile](https://arxiv.org/abs/2303.171
 The training dataset consists of 227 105 079 296 tokens. It was trained on the Rattler supercomputer at the Dell Technologies Edge Innovation Center in Austin, Texas. The training used 23 nodes of a duration of 30 days, where one node contained 4X Nvidia A100 GPUs, yielding 92 GPUs.
 ## trainer.yaml:
-```bash
 learning_rate: 2e-5
 warmup_steps: 100
 lr_scheduler: cosine
@@ -72,6 +72,37 @@ micro_batch_size: 1
 num_epochs: 1
 sequence_len: 8192
 ```
 ![](https://huggingface.co/AI-Sweden-Models/Llama-3-8B/resolve/main/13333333.jpg?download=true)
 ## Checkpoints

 The training dataset consists of 227 105 079 296 tokens. It was trained on the Rattler supercomputer at the Dell Technologies Edge Innovation Center in Austin, Texas. The training used 23 nodes of a duration of 30 days, where one node contained 4X Nvidia A100 GPUs, yielding 92 GPUs.
 ## trainer.yaml:
+```yaml
 learning_rate: 2e-5
 warmup_steps: 100
 lr_scheduler: cosine
 num_epochs: 1
 sequence_len: 8192
 ```
+## deepspeed_zero2.json:
+```json
+{
+  "zero_optimization": {
+    "stage": 2,
+    "offload_optimizer": {
+      "device": "cpu"
+    },
+    "contiguous_gradients": true,
+    "overlap_comm": true
+  },
+  "bf16": {
+    "enabled": "auto"
+  },
+  "fp16": {
+    "enabled": "auto",
+    "auto_cast": false,
+    "loss_scale": 0,
+    "initial_scale_power": 32,
+    "loss_scale_window": 1000,
+    "hysteresis": 2,
+    "min_loss_scale": 1
+  },
+  "gradient_accumulation_steps": "auto",
+  "gradient_clipping": "auto",
+  "train_batch_size": "auto",
+  "train_micro_batch_size_per_gpu": "auto",
+  "wall_clock_breakdown": false
+}
+```
 ![](https://huggingface.co/AI-Sweden-Models/Llama-3-8B/resolve/main/13333333.jpg?download=true)
 ## Checkpoints