metadata

license: other
base_model: microsoft/phi-1_5
tags:
  - generated_from_trainer
model-index:
  - name: phi-1_5-pl-v_0_1
    results: []

phi-1_5-pl-v_0_1

This model is based on microsoft/phi-1_5. It was trained from scratch on the 20231201 Polish Wikipedia dump.

Model description

The model was trained for a context length of 1024 tokens.

Intended uses & limitations

The model is intended for research purposes only. It may generate fictitious, incorrect, unethical, or biased texts. At its current state, it is not suitable for production purposes.

Training and evaluation data

The 20231201 Polish Wikipedia dump.

Training procedure

Training environment

GPU: 4 x RTX4090 (24GB per GPU, 96GB total)
CPU: AMD EPYC 75F3 32-core (128 virtual cores)
RAM: 258GB
Motherboard: ROME2D32GM PCLe 4.0, 16x
Storage: nvme 194.0GB

Training hyperparameters

The following hyperparameters were used during training:

learning_rate: 0.0002
distributed_type: multi-GPU (DDP)
num_devices: 4
train_batch_size: 2
gradient_accumulation_steps: 8
total_train_batch_size: 64
optimizer: Adam with betas=(0.9,0.98) and epsilon=1e-07
lr_scheduler_type: cosine
lr_scheduler_warmup_ratio: 0.1
num_epochs: 2
precision: bf16
seed: 42

Training results

runtime: 2d 21h 26m 36s
train_loss: 2.727

Framework versions

Transformers 4.36.2
Pytorch 2.1.2
Datasets 2.14.7
Tokenizers 0.15.0