teddy-f-47 commited on
Commit
5c83f1a
1 Parent(s): db5f960

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +13 -7
README.md CHANGED
@@ -13,7 +13,7 @@ should probably proofread and complete it, then remove this comment. -->
13
 
14
  # phi-1_5-pl-v_0_1
15
 
16
- This model is a fine-tuned version of [microsoft/phi-1_5](https://huggingface.co/microsoft/phi-1_5) on an unknown dataset.
17
 
18
  ## Model description
19
 
@@ -25,30 +25,36 @@ More information needed
25
 
26
  ## Training and evaluation data
27
 
28
- More information needed
29
 
30
  ## Training procedure
31
 
 
 
 
 
 
 
 
 
32
  ### Training hyperparameters
33
 
34
  The following hyperparameters were used during training:
35
  - learning_rate: 0.0002
36
- - train_batch_size: 2
37
- - eval_batch_size: 8
38
- - seed: 42
39
  - distributed_type: multi-GPU
40
  - num_devices: 4
 
41
  - gradient_accumulation_steps: 8
42
  - total_train_batch_size: 64
43
- - total_eval_batch_size: 32
44
  - optimizer: Adam with betas=(0.9,0.98) and epsilon=1e-07
45
  - lr_scheduler_type: cosine
46
  - lr_scheduler_warmup_ratio: 0.1
47
  - num_epochs: 2
 
48
 
49
  ### Training results
50
 
51
-
52
 
53
  ### Framework versions
54
 
 
13
 
14
  # phi-1_5-pl-v_0_1
15
 
16
+ This model is based on [microsoft/phi-1_5](https://huggingface.co/microsoft/phi-1_5). It is trained from scratch on the 20231201 Polish Wikipedia dump.
17
 
18
  ## Model description
19
 
 
25
 
26
  ## Training and evaluation data
27
 
28
+ The 20231201 Polish Wikipedia dump.
29
 
30
  ## Training procedure
31
 
32
+ ### Training environment
33
+
34
+ GPU: 4 x RTX4090 (24GB per GPU, 96GB total)
35
+ CPU: AMD EPYC 75F3 32-core (128 virtual cores)
36
+ RAM: 258GB
37
+ Motherboard: ROME2D32GM PCLe 4.0, 16x
38
+ Storage: nvme 194.0GB
39
+
40
  ### Training hyperparameters
41
 
42
  The following hyperparameters were used during training:
43
  - learning_rate: 0.0002
 
 
 
44
  - distributed_type: multi-GPU
45
  - num_devices: 4
46
+ - train_batch_size: 2
47
  - gradient_accumulation_steps: 8
48
  - total_train_batch_size: 64
 
49
  - optimizer: Adam with betas=(0.9,0.98) and epsilon=1e-07
50
  - lr_scheduler_type: cosine
51
  - lr_scheduler_warmup_ratio: 0.1
52
  - num_epochs: 2
53
+ - seed: 42
54
 
55
  ### Training results
56
 
57
+ train_loss: 2.727
58
 
59
  ### Framework versions
60