seungduk commited on
Commit
a4ddde9
·
1 Parent(s): 819da48

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +37 -17
README.md CHANGED
@@ -8,29 +8,27 @@ model-index:
8
  results: []
9
  ---
10
 
11
- <!-- This model card has been generated automatically according to the information the Trainer had access to. You
12
- should probably proofread and complete it, then remove this comment. -->
13
-
14
  [<img src="https://raw.githubusercontent.com/OpenAccess-AI-Collective/axolotl/main/image/axolotl-badge-web.png" alt="Built with Axolotl" width="200" height="32"/>](https://github.com/OpenAccess-AI-Collective/axolotl)
15
- # data/seungduk/out-solar-both
16
 
17
- This model is a fine-tuned version of [upstage/SOLAR-10.7B-v1.0](https://huggingface.co/upstage/SOLAR-10.7B-v1.0) on the None dataset.
 
18
 
19
- ## Model description
20
 
21
- More information needed
22
 
23
- ## Intended uses & limitations
24
 
25
- More information needed
26
 
27
- ## Training and evaluation data
28
 
29
- More information needed
30
 
31
- ## Training procedure
32
 
33
- ### Training hyperparameters
34
 
35
  The following hyperparameters were used during training:
36
  - learning_rate: 0.0003
@@ -42,18 +40,40 @@ The following hyperparameters were used during training:
42
  - gradient_accumulation_steps: 4
43
  - total_train_batch_size: 256
44
  - total_eval_batch_size: 64
45
- - optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
46
  - lr_scheduler_type: cosine
47
  - lr_scheduler_warmup_steps: 10
48
  - training_steps: 1800
49
 
50
- ### Training results
 
 
 
 
 
 
 
 
 
 
 
 
51
 
 
52
 
 
 
 
 
 
 
 
 
 
53
 
54
- ### Framework versions
55
 
56
  - Transformers 4.37.0.dev0
57
  - Pytorch 2.1.2+cu121
58
  - Datasets 2.16.0
59
- - Tokenizers 0.15.0
 
8
  results: []
9
  ---
10
 
 
 
 
11
  [<img src="https://raw.githubusercontent.com/OpenAccess-AI-Collective/axolotl/main/image/axolotl-badge-web.png" alt="Built with Axolotl" width="200" height="32"/>](https://github.com/OpenAccess-AI-Collective/axolotl)
12
+ # yanolja/KoSOLAR-10.7B-v0.1
13
 
14
+ This model is a Korean vocabulary-extended version of [upstage/SOLAR-10.7B-v1.0](https://huggingface.co/upstage/SOLAR-10.7B-v1.0), trained on various Korean web-crawled datasets that are publicly available on HuggingFace.
15
+ The hypothesis was that while maintaining the original performance of the base model, we could add more tokens to the base model's vocabulary by training the embeddings for the new tokens only. The evaluation results seem to indicate that both English and Korean performances were preserved.
16
 
17
+ ## Model Description
18
 
19
+ Most parameters of [upstage/SOLAR-10.7B-v1.0](https://huggingface.co/upstage/SOLAR-10.7B-v1.0) were frozen except for the embed_tokens layer and the lm_head layer. Embeddings for the existing tokens in those layers were frozen during training. The embeddings for the new tokens have been tuned.
20
 
21
+ ## Intended Uses & Limitations
22
 
23
+ No instruction tuning has been performed. You should train this model for your specific purposes with caution.
24
 
25
+ ## Training and Evaluation Data
26
 
27
+ Various Korean web-crawled datasets that are open on HuggingFace.
28
 
29
+ ## Training Procedure
30
 
31
+ ### Training Hyperparameters
32
 
33
  The following hyperparameters were used during training:
34
  - learning_rate: 0.0003
 
40
  - gradient_accumulation_steps: 4
41
  - total_train_batch_size: 256
42
  - total_eval_batch_size: 64
43
+ - optimizer: Adam with betas=(0.9, 0.999) and epsilon=1e-08
44
  - lr_scheduler_type: cosine
45
  - lr_scheduler_warmup_steps: 10
46
  - training_steps: 1800
47
 
48
+ ### Training Results
49
+
50
+ #### upstage/SOLAR-10.7B-v1.0
51
+
52
+ | Groups | Version | Filter | n-shot | Metric | Value | | Stderr |
53
+ |-------------|---------|-----------|--------|-------------|--------|-----|--------|
54
+ | kmmlu | N/A | none | 0 | acc | 0.3004 | ± | 0.0528 |
55
+ | | | none | 0 | acc_norm | 0.3004 | ± | 0.0528 |
56
+ | gsm8k | Yaml | get-answer| 5 | exact_match | 0.5625 | ± | 0.0137 |
57
+ | hellaswag | Yaml | none | 0 | acc | 0.6393 | ± | 0.0048 |
58
+ | mmlu | N/A | none | 0 | acc | 0.6305 | ± | 0.1452 |
59
+ | truthfulqa | N/A | none | 0 | acc | 0.4096 | ± | 0.0467 |
60
+ | winogrande | Yaml | none | 0 | acc | 0.7443 | ± | 0.0123 |
61
 
62
+ #### yanolja/KoSOLAR-10.7B-v0.1
63
 
64
+ | Groups | Version | Filter | n-shot | Metric | Value | | Stderr |
65
+ |-------------|---------|-----------|--------|-------------|--------|-----|--------|
66
+ | kmmlu | N/A | none | 0 | acc | 0.2946 | ± | 0.0496 |
67
+ | | | none | 0 | acc_norm | 0.2946 | ± | 0.0496 |
68
+ | gsm8k | Yaml | get-answer| 5 | exact_match | 0.5527 | ± | 0.0137 |
69
+ | hellaswag | Yaml | none | 0 | acc | 0.6392 | ± | 0.0048 |
70
+ | mmlu | N/A | none | 0 | acc | 0.6303 | ± | 0.1411 |
71
+ | truthfulqa | N/A | none | 0 | acc | 0.3618 | ± | 0.0472 |
72
+ | winogrande | Yaml | none | 0 | acc | 0.7459 | ± | 0.0122 |
73
 
74
+ ### Framework Versions
75
 
76
  - Transformers 4.37.0.dev0
77
  - Pytorch 2.1.2+cu121
78
  - Datasets 2.16.0
79
+ - Tokenizers 0.15.0