myBit-Llama2-jp-127M-test-4

This model is a fine-tuned version of TinyLlama/TinyLlama-1.1B-Chat-v1.0 on an unknown dataset. It achieves the following results on the evaluation set:

  • Loss: 10.6247

Model description

More information needed

Intended uses & limitations

More information needed

Training and evaluation data

More information needed

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

  • learning_rate: 8.4e-05
  • train_batch_size: 96
  • eval_batch_size: 96
  • seed: 42
  • optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
  • lr_scheduler_type: polynomial
  • lr_scheduler_warmup_steps: 250
  • num_epochs: 1

Training results

Training Loss Epoch Step Validation Loss
9.6724 0.04 100 8.7189
7.811 0.07 200 6.9856
6.7931 0.11 300 6.5599
6.4108 0.15 400 6.1841
6.1428 0.18 500 5.9554
5.8814 0.22 600 5.7176
5.6803 0.26 700 5.5171
5.5181 0.29 800 5.4037
5.4115 0.33 900 5.3197
5.3497 0.37 1000 5.2965
5.3629 0.4 1100 5.3632
5.6291 0.44 1200 5.9554
6.9173 0.47 1300 8.0749
9.1158 0.51 1400 9.8847
10.2012 0.55 1500 10.3942
10.4725 0.58 1600 10.5218
10.5453 0.62 1700 10.5627
10.5752 0.66 1800 10.5838
10.5915 0.69 1900 10.5969
10.6018 0.73 2000 10.6053
10.6091 0.77 2100 10.6115
10.6141 0.8 2200 10.6156
10.6175 0.84 2300 10.6186
10.6203 0.88 2400 10.6212
10.6225 0.91 2500 10.6225
10.6238 0.95 2600 10.6240
10.625 0.99 2700 10.6247

Framework versions

  • Transformers 4.38.2
  • Pytorch 2.1.0+cu121
  • Datasets 2.18.0
  • Tokenizers 0.15.2
Downloads last month
18
Safetensors
Model size
128M params
Tensor type
F32
·
Inference Examples
This model does not have enough activity to be deployed to Inference API (serverless) yet. Increase its social visibility and check back later, or deploy to Inference Endpoints (dedicated) instead.

Model tree for HachiML/myBit-Llama2-jp-127M-test-4

Finetuned
(168)
this model