metadata
license: other
base_model: microsoft/phi-1_5
tags:
- generated_from_trainer
model-index:
- name: phi-1_5-pl-v_0_1
results: []
phi-1_5-pl-v_0_1
This model is based on microsoft/phi-1_5. It was trained from scratch on the 20231201 Polish Wikipedia dump.
Model description
The model was trained for a context length of 1024 tokens.
Intended uses & limitations
The model is intended for research purposes only. It may generate fictitious, incorrect, unethical, or biased texts. At its current state, it is not suitable for production purposes.
Training and evaluation data
The 20231201 Polish Wikipedia dump.
Training procedure
Training environment
- GPU: 4 x RTX4090 (24GB per GPU, 96GB total)
- CPU: AMD EPYC 75F3 32-core (128 virtual cores)
- RAM: 258GB
- Motherboard: ROME2D32GM PCLe 4.0, 16x
- Storage: nvme 194.0GB
Training hyperparameters
The following hyperparameters were used during training:
- learning_rate: 0.0002
- distributed_type: multi-GPU (DDP)
- num_devices: 4
- train_batch_size: 2
- gradient_accumulation_steps: 8
- total_train_batch_size: 64
- optimizer: Adam with betas=(0.9,0.98) and epsilon=1e-07
- lr_scheduler_type: cosine
- lr_scheduler_warmup_ratio: 0.1
- num_epochs: 2
- precision: bf16
- seed: 42
Training results
- runtime: 2d 21h 26m 36s
- train_loss: 2.727
Framework versions
- Transformers 4.36.2
- Pytorch 2.1.2
- Datasets 2.14.7
- Tokenizers 0.15.0