trollek commited on
Commit
6216eb1
·
verified ·
1 Parent(s): c002486

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +140 -3
README.md CHANGED
@@ -1,3 +1,140 @@
1
- ---
2
- license: llama3.1
3
- ---
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ license: llama3.1
3
+ datasets:
4
+ - trollek/Danoia-v03
5
+ - trollek/Danoia-v02
6
+ - N8Programs/CreativeGPT
7
+ - Gryphe/Opus-WritingPrompts
8
+ language:
9
+ - da
10
+ - en
11
+ base_model:
12
+ - unsloth/Meta-Llama-3.1-8B-Instruct
13
+ library_name: transformers
14
+ tags:
15
+ - llama-factory
16
+ - lora
17
+ - unsloth
18
+ ---
19
+ # Llama 3.1 8B Danoia
20
+
21
+ This model is a fine-tuned version of [unsloth/Meta-Llama-3.1-8B-Instruct](https://huggingface.co/unsloth/Meta-Llama-3.1-8B-Instruct) on the danoia_v03, the opus_writing_instruct, the creativegpt and the danoia_v02_no_system datasets + some private datasets related to evaluation.
22
+
23
+ It achieves the following results on the evaluation set:
24
+ - Loss: 0.7108
25
+
26
+ ## Model description
27
+
28
+ This model can write stories in danish and english. It can do much more, I am sure of it, but not more than the vanilla model it is based on.
29
+
30
+ ## Intended uses & limitations
31
+
32
+ Danoia is intended to be private assistant able to write essays, summarise articles, and be a helpful assistant in general. It misspells danish words at times but it is rare though.
33
+
34
+ ## Training and evaluation data
35
+
36
+ I trained this using [LLama-Factory](https://github.com/hiyouga/LLaMA-Factory "LLama Factorys' GitHub") with [unsloth](https://github.com/unslothai/unsloth "unsloths' GitHub") enabled on a 16GB 4060 Ti. It took 30 hours and peaked at 13GB VRAM usage.
37
+
38
+ <details>
39
+
40
+ <summary>Show LLama-Factory config</summary>
41
+
42
+ ```yaml
43
+ ### model
44
+ model_name_or_path: unsloth/Meta-Llama-3.1-8B-Instruct
45
+
46
+ ### method
47
+ stage: sft
48
+ do_train: true
49
+ finetuning_type: lora
50
+ lora_target: all
51
+ loraplus_lr_ratio: 16.0
52
+ lora_rank: 16
53
+ lora_alpha: 32
54
+ use_unsloth: true
55
+ use_unsloth_gc: true
56
+ quantization_bit: 4
57
+ upcast_layernorm: true
58
+ seed: 192
59
+
60
+ ### dataset
61
+ dataset: danoia_v03,opus_writing_instruct,creativegpt,danoia_v02_no_system
62
+ template: llama3
63
+ cutoff_len: 8192
64
+ overwrite_cache: false
65
+ preprocessing_num_workers: 12
66
+
67
+ ### output
68
+ output_dir: llama31/8b_instruct/loras/danoia
69
+ logging_steps: 1
70
+ save_steps: 500
71
+ save_strategy: steps
72
+ plot_loss: true
73
+ overwrite_output_dir: false
74
+
75
+ ### train
76
+ per_device_train_batch_size: 2
77
+ gradient_accumulation_steps: 4
78
+ learning_rate: 1.5e-5
79
+ num_train_epochs: 1.5
80
+ lr_scheduler_type: cosine
81
+ warmup_ratio: 0.01
82
+ bf16: true
83
+
84
+ ## eval
85
+ val_size: 0.01
86
+ per_device_eval_batch_size: 1
87
+ eval_strategy: steps
88
+ eval_steps: 500
89
+ ```
90
+ </details>
91
+
92
+ ## Training procedure
93
+
94
+ ### Training hyperparameters
95
+
96
+ The following hyperparameters were used during training:
97
+ - learning_rate: 1.5e-05
98
+ - train_batch_size: 2
99
+ - eval_batch_size: 1
100
+ - seed: 192
101
+ - gradient_accumulation_steps: 4
102
+ - total_train_batch_size: 8
103
+ - optimizer: Use OptimizerNames.ADAMW_TORCH with betas=(0.9,0.999) and epsilon=1e-08 and optimizer_args=No additional optimizer arguments
104
+ - lr_scheduler_type: cosine
105
+ - lr_scheduler_warmup_ratio: 0.01
106
+ - num_epochs: 1.5
107
+
108
+ ### Training results
109
+
110
+ | Training Loss | Epoch | Step | Validation Loss |
111
+ |:-------------:|:------:|:-----:|:---------------:|
112
+ | 0.2352 | 0.0719 | 500 | 0.8450 |
113
+ | 0.1742 | 0.1438 | 1000 | 0.8090 |
114
+ | 0.1667 | 0.2156 | 1500 | 0.7889 |
115
+ | 0.3791 | 0.2875 | 2000 | 0.7750 |
116
+ | 0.1989 | 0.3594 | 2500 | 0.7665 |
117
+ | 0.2347 | 0.4313 | 3000 | 0.7563 |
118
+ | 0.1694 | 0.5032 | 3500 | 0.7498 |
119
+ | 0.2351 | 0.5750 | 4000 | 0.7412 |
120
+ | 0.2322 | 0.6469 | 4500 | 0.7363 |
121
+ | 0.1689 | 0.7188 | 5000 | 0.7298 |
122
+ | 0.1953 | 0.7907 | 5500 | 0.7250 |
123
+ | 0.2099 | 0.8626 | 6000 | 0.7214 |
124
+ | 0.2368 | 0.9344 | 6500 | 0.7166 |
125
+ | 0.1632 | 1.0063 | 7000 | 0.7151 |
126
+ | 0.1558 | 1.0782 | 7500 | 0.7157 |
127
+ | 0.2854 | 1.1501 | 8000 | 0.7139 |
128
+ | 0.199 | 1.2220 | 8500 | 0.7127 |
129
+ | 0.1606 | 1.2938 | 9000 | 0.7117 |
130
+ | 0.1788 | 1.3657 | 9500 | 0.7112 |
131
+ | 0.2618 | 1.4376 | 10000 | 0.7109 |
132
+
133
+
134
+ ### Framework versions
135
+
136
+ - PEFT 0.12.0
137
+ - Transformers 4.46.1
138
+ - Pytorch 2.5.1
139
+ - Datasets 3.1.0
140
+ - Tokenizers 0.20.3