---
license: other
base_model: microsoft/phi-1_5
tags:
- generated_from_trainer
model-index:
- name: phi-1_5-pl-v_0_1
  results: []
---

<!-- This model card has been generated automatically according to the information the Trainer had access to. You
should probably proofread and complete it, then remove this comment. -->

# phi-1_5-pl-v_0_1

This model is based on [microsoft/phi-1_5](https://huggingface.co/microsoft/phi-1_5). It was trained from scratch on the 20231201 Polish Wikipedia dump.

## Model description

The model was trained for a context length of 1024 tokens.

## Intended uses & limitations

The model is intended for research purposes only. It may generate fictitious, incorrect, unethical, or biased texts. At its current state, it is not suitable for production purposes.

## Training and evaluation data

The 20231201 Polish Wikipedia dump.

## Training procedure

### Training environment

- GPU: 4 x RTX4090 (24GB per GPU, 96GB total)
- CPU: AMD EPYC 75F3 32-core (128 virtual cores)
- RAM: 258GB
- Motherboard: ROME2D32GM PCLe 4.0, 16x
- Storage: nvme 194.0GB

### Training hyperparameters

The following hyperparameters were used during training:
- learning_rate: 0.0002
- distributed_type: multi-GPU (DDP)
- num_devices: 4
- train_batch_size: 2
- gradient_accumulation_steps: 8
- total_train_batch_size: 64
- optimizer: Adam with betas=(0.9,0.98) and epsilon=1e-07
- lr_scheduler_type: cosine
- lr_scheduler_warmup_ratio: 0.1
- num_epochs: 2
- precision: bf16
- seed: 42

### Training results

- runtime: 2d 21h 26m 36s
- train_loss: 2.727

### Framework versions

- Transformers 4.36.2
- Pytorch 2.1.2
- Datasets 2.14.7
- Tokenizers 0.15.0