myBit-Llama2-jp-127M-test-4

This model is a fine-tuned version of TinyLlama/TinyLlama-1.1B-Chat-v1.0 on an unknown dataset. It achieves the following results on the evaluation set:

Loss: 10.6247

Model description

More information needed

Intended uses & limitations

More information needed

Training and evaluation data

More information needed

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

learning_rate: 8.4e-05
train_batch_size: 96
eval_batch_size: 96
seed: 42
optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
lr_scheduler_type: polynomial
lr_scheduler_warmup_steps: 250
num_epochs: 1

Training results

Training Loss	Epoch	Step	Validation Loss
9.6724	0.04	100	8.7189
7.811	0.07	200	6.9856
6.7931	0.11	300	6.5599
6.4108	0.15	400	6.1841
6.1428	0.18	500	5.9554
5.8814	0.22	600	5.7176
5.6803	0.26	700	5.5171
5.5181	0.29	800	5.4037
5.4115	0.33	900	5.3197
5.3497	0.37	1000	5.2965
5.3629	0.4	1100	5.3632
5.6291	0.44	1200	5.9554
6.9173	0.47	1300	8.0749
9.1158	0.51	1400	9.8847
10.2012	0.55	1500	10.3942
10.4725	0.58	1600	10.5218
10.5453	0.62	1700	10.5627
10.5752	0.66	1800	10.5838
10.5915	0.69	1900	10.5969
10.6018	0.73	2000	10.6053
10.6091	0.77	2100	10.6115
10.6141	0.8	2200	10.6156
10.6175	0.84	2300	10.6186
10.6203	0.88	2400	10.6212
10.6225	0.91	2500	10.6225
10.6238	0.95	2600	10.6240
10.625	0.99	2700	10.6247

Framework versions

Transformers 4.38.2
Pytorch 2.1.0+cu121
Datasets 2.18.0
Tokenizers 0.15.2

HachiML
/

myBit-Llama2-jp-127M-test-4

myBit-Llama2-jp-127M-test-4

Model description

Intended uses & limitations

Training and evaluation data

Training procedure

Training hyperparameters

Training results

Framework versions

Model tree for HachiML/myBit-Llama2-jp-127M-test-4

Evaluation results