timpal0l commited on
Commit
2ac5533
·
verified ·
1 Parent(s): b65cbf8

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +32 -1
README.md CHANGED
@@ -61,7 +61,7 @@ It was trained on a subset from [The Nordic Pile](https://arxiv.org/abs/2303.171
61
  The training dataset consists of 227 105 079 296 tokens. It was trained on the Rattler supercomputer at the Dell Technologies Edge Innovation Center in Austin, Texas. The training used 23 nodes of a duration of 30 days, where one node contained 4X Nvidia A100 GPUs, yielding 92 GPUs.
62
 
63
  ## trainer.yaml:
64
- ```bash
65
  learning_rate: 2e-5
66
  warmup_steps: 100
67
  lr_scheduler: cosine
@@ -72,6 +72,37 @@ micro_batch_size: 1
72
  num_epochs: 1
73
  sequence_len: 8192
74
  ```
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
75
  ![](https://huggingface.co/AI-Sweden-Models/Llama-3-8B/resolve/main/13333333.jpg?download=true)
76
 
77
  ## Checkpoints
 
61
  The training dataset consists of 227 105 079 296 tokens. It was trained on the Rattler supercomputer at the Dell Technologies Edge Innovation Center in Austin, Texas. The training used 23 nodes of a duration of 30 days, where one node contained 4X Nvidia A100 GPUs, yielding 92 GPUs.
62
 
63
  ## trainer.yaml:
64
+ ```yaml
65
  learning_rate: 2e-5
66
  warmup_steps: 100
67
  lr_scheduler: cosine
 
72
  num_epochs: 1
73
  sequence_len: 8192
74
  ```
75
+
76
+ ## deepspeed_zero2.json:
77
+ ```json
78
+ {
79
+ "zero_optimization": {
80
+ "stage": 2,
81
+ "offload_optimizer": {
82
+ "device": "cpu"
83
+ },
84
+ "contiguous_gradients": true,
85
+ "overlap_comm": true
86
+ },
87
+ "bf16": {
88
+ "enabled": "auto"
89
+ },
90
+ "fp16": {
91
+ "enabled": "auto",
92
+ "auto_cast": false,
93
+ "loss_scale": 0,
94
+ "initial_scale_power": 32,
95
+ "loss_scale_window": 1000,
96
+ "hysteresis": 2,
97
+ "min_loss_scale": 1
98
+ },
99
+ "gradient_accumulation_steps": "auto",
100
+ "gradient_clipping": "auto",
101
+ "train_batch_size": "auto",
102
+ "train_micro_batch_size_per_gpu": "auto",
103
+ "wall_clock_breakdown": false
104
+ }
105
+ ```
106
  ![](https://huggingface.co/AI-Sweden-Models/Llama-3-8B/resolve/main/13333333.jpg?download=true)
107
 
108
  ## Checkpoints