metadata

language:
  - ko
  - ja
base_model: facebook/mbart-large-50-many-to-many-mmt
tags:
  - generated_from_trainer
metrics:
  - bleu
model-index:
  - name: mbart-mmt_mid3_ko-ja
    results: []

mbart-mmt_mid3_ko-ja

This model is a fine-tuned version of facebook/mbart-large-50-many-to-many-mmt on an unknown dataset. It achieves the following results on the evaluation set:

Loss: 0.8652
Bleu: 10.1883
Gen Len: 17.2057

Model description

More information needed

Intended uses & limitations

More information needed

Training and evaluation data

More information needed

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

learning_rate: 5e-05
train_batch_size: 4
eval_batch_size: 4
seed: 42
distributed_type: multi-GPU
num_devices: 4
gradient_accumulation_steps: 2
total_train_batch_size: 32
total_eval_batch_size: 16
optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
lr_scheduler_type: linear
lr_scheduler_warmup_steps: 500
num_epochs: 35

Training results

Training Loss	Epoch	Step	Validation Loss	Bleu	Gen Len
1.6216	0.23	1500	1.5229	2.686	17.599
1.3587	0.46	3000	1.3061	4.0749	17.3772
1.2279	0.68	4500	1.1881	5.2878	17.3642
1.1408	0.91	6000	1.0994	5.4783	17.4093
0.9977	1.14	7500	1.0313	7.6015	17.36
0.9582	1.37	9000	0.9918	8.2303	17.3526
0.9525	1.59	10500	0.9811	8.2837	17.2597
0.9415	1.82	12000	0.9589	8.1592	17.2241
0.856	2.05	13500	0.9462	7.8401	17.4066
0.8273	2.28	15000	0.9336	8.6082	17.1918
0.8066	2.5	16500	0.9220	9.7751	17.5198
0.784	2.73	18000	0.8949	10.292	17.4097
0.8016	2.96	19500	0.8958	9.0262	17.4097
0.6872	3.19	21000	0.9043	9.7549	17.2672
0.7107	3.42	22500	0.8994	10.3016	17.0973
0.6726	3.64	24000	0.8747	10.5183	17.2871
0.6699	3.87	25500	0.8652	10.1883	17.2057
0.612	4.1	27000	0.8949	9.5697	17.2443
0.621	4.33	28500	0.8904	10.8592	17.329
0.6219	4.55	30000	0.8772	10.925	17.482
0.6164	4.78	31500	0.8694	11.8749	17.1624

Framework versions

Transformers 4.34.0
Pytorch 2.1.0+cu121
Datasets 2.14.5
Tokenizers 0.14.1