teddy-f-47
/

phi-pl-400M-v_0_1

Text Generation

Generated from Trainer

Inference Endpoints

Model card Files Files and versions Metrics Training metrics Community

teddy-f-47 commited on Jan 7

Commit

5c83f1a

•

1 Parent(s): db5f960

Update README.md

Files changed (1) hide show

README.md +13 -7

README.md CHANGED Viewed

@@ -13,7 +13,7 @@ should probably proofread and complete it, then remove this comment. -->
 # phi-1_5-pl-v_0_1
-This model is a fine-tuned version of [microsoft/phi-1_5](https://huggingface.co/microsoft/phi-1_5) on an unknown dataset.
 ## Model description
@@ -25,30 +25,36 @@ More information needed
 ## Training and evaluation data
-More information needed
 ## Training procedure
 ### Training hyperparameters
 The following hyperparameters were used during training:
 - learning_rate: 0.0002
-- train_batch_size: 2
-- eval_batch_size: 8
-- seed: 42
 - distributed_type: multi-GPU
 - num_devices: 4
 - gradient_accumulation_steps: 8
 - total_train_batch_size: 64
-- total_eval_batch_size: 32
 - optimizer: Adam with betas=(0.9,0.98) and epsilon=1e-07
 - lr_scheduler_type: cosine
 - lr_scheduler_warmup_ratio: 0.1
 - num_epochs: 2
 ### Training results
 ### Framework versions

 # phi-1_5-pl-v_0_1
+This model is based on [microsoft/phi-1_5](https://huggingface.co/microsoft/phi-1_5). It is trained from scratch on the 20231201 Polish Wikipedia dump.
 ## Model description
 ## Training and evaluation data
+The 20231201 Polish Wikipedia dump.
 ## Training procedure
+### Training environment
+GPU: 4 x RTX4090 (24GB per GPU, 96GB total)
+CPU: AMD EPYC 75F3 32-core (128 virtual cores)
+RAM: 258GB
+Motherboard: ROME2D32GM PCLe 4.0, 16x
+Storage: nvme 194.0GB
 ### Training hyperparameters
 The following hyperparameters were used during training:
 - learning_rate: 0.0002
 - distributed_type: multi-GPU
 - num_devices: 4
+- train_batch_size: 2
 - gradient_accumulation_steps: 8
 - total_train_batch_size: 64
 - optimizer: Adam with betas=(0.9,0.98) and epsilon=1e-07
 - lr_scheduler_type: cosine
 - lr_scheduler_warmup_ratio: 0.1
 - num_epochs: 2
+- seed: 42
 ### Training results
+train_loss: 2.727
 ### Framework versions