Update README.md
Browse files
README.md
CHANGED
@@ -8,29 +8,27 @@ model-index:
|
|
8 |
results: []
|
9 |
---
|
10 |
|
11 |
-
<!-- This model card has been generated automatically according to the information the Trainer had access to. You
|
12 |
-
should probably proofread and complete it, then remove this comment. -->
|
13 |
-
|
14 |
[<img src="https://raw.githubusercontent.com/OpenAccess-AI-Collective/axolotl/main/image/axolotl-badge-web.png" alt="Built with Axolotl" width="200" height="32"/>](https://github.com/OpenAccess-AI-Collective/axolotl)
|
15 |
-
#
|
16 |
|
17 |
-
This model is a
|
|
|
18 |
|
19 |
-
## Model
|
20 |
|
21 |
-
|
22 |
|
23 |
-
## Intended
|
24 |
|
25 |
-
|
26 |
|
27 |
-
## Training and
|
28 |
|
29 |
-
|
30 |
|
31 |
-
## Training
|
32 |
|
33 |
-
### Training
|
34 |
|
35 |
The following hyperparameters were used during training:
|
36 |
- learning_rate: 0.0003
|
@@ -42,18 +40,40 @@ The following hyperparameters were used during training:
|
|
42 |
- gradient_accumulation_steps: 4
|
43 |
- total_train_batch_size: 256
|
44 |
- total_eval_batch_size: 64
|
45 |
-
- optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
|
46 |
- lr_scheduler_type: cosine
|
47 |
- lr_scheduler_warmup_steps: 10
|
48 |
- training_steps: 1800
|
49 |
|
50 |
-
### Training
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
51 |
|
|
|
52 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
53 |
|
54 |
-
### Framework
|
55 |
|
56 |
- Transformers 4.37.0.dev0
|
57 |
- Pytorch 2.1.2+cu121
|
58 |
- Datasets 2.16.0
|
59 |
-
- Tokenizers 0.15.0
|
|
|
8 |
results: []
|
9 |
---
|
10 |
|
|
|
|
|
|
|
11 |
[<img src="https://raw.githubusercontent.com/OpenAccess-AI-Collective/axolotl/main/image/axolotl-badge-web.png" alt="Built with Axolotl" width="200" height="32"/>](https://github.com/OpenAccess-AI-Collective/axolotl)
|
12 |
+
# yanolja/KoSOLAR-10.7B-v0.1
|
13 |
|
14 |
+
This model is a Korean vocabulary-extended version of [upstage/SOLAR-10.7B-v1.0](https://huggingface.co/upstage/SOLAR-10.7B-v1.0), trained on various Korean web-crawled datasets that are publicly available on HuggingFace.
|
15 |
+
The hypothesis was that while maintaining the original performance of the base model, we could add more tokens to the base model's vocabulary by training the embeddings for the new tokens only. The evaluation results seem to indicate that both English and Korean performances were preserved.
|
16 |
|
17 |
+
## Model Description
|
18 |
|
19 |
+
Most parameters of [upstage/SOLAR-10.7B-v1.0](https://huggingface.co/upstage/SOLAR-10.7B-v1.0) were frozen except for the embed_tokens layer and the lm_head layer. Embeddings for the existing tokens in those layers were frozen during training. The embeddings for the new tokens have been tuned.
|
20 |
|
21 |
+
## Intended Uses & Limitations
|
22 |
|
23 |
+
No instruction tuning has been performed. You should train this model for your specific purposes with caution.
|
24 |
|
25 |
+
## Training and Evaluation Data
|
26 |
|
27 |
+
Various Korean web-crawled datasets that are open on HuggingFace.
|
28 |
|
29 |
+
## Training Procedure
|
30 |
|
31 |
+
### Training Hyperparameters
|
32 |
|
33 |
The following hyperparameters were used during training:
|
34 |
- learning_rate: 0.0003
|
|
|
40 |
- gradient_accumulation_steps: 4
|
41 |
- total_train_batch_size: 256
|
42 |
- total_eval_batch_size: 64
|
43 |
+
- optimizer: Adam with betas=(0.9, 0.999) and epsilon=1e-08
|
44 |
- lr_scheduler_type: cosine
|
45 |
- lr_scheduler_warmup_steps: 10
|
46 |
- training_steps: 1800
|
47 |
|
48 |
+
### Training Results
|
49 |
+
|
50 |
+
#### upstage/SOLAR-10.7B-v1.0
|
51 |
+
|
52 |
+
| Groups | Version | Filter | n-shot | Metric | Value | | Stderr |
|
53 |
+
|-------------|---------|-----------|--------|-------------|--------|-----|--------|
|
54 |
+
| kmmlu | N/A | none | 0 | acc | 0.3004 | ± | 0.0528 |
|
55 |
+
| | | none | 0 | acc_norm | 0.3004 | ± | 0.0528 |
|
56 |
+
| gsm8k | Yaml | get-answer| 5 | exact_match | 0.5625 | ± | 0.0137 |
|
57 |
+
| hellaswag | Yaml | none | 0 | acc | 0.6393 | ± | 0.0048 |
|
58 |
+
| mmlu | N/A | none | 0 | acc | 0.6305 | ± | 0.1452 |
|
59 |
+
| truthfulqa | N/A | none | 0 | acc | 0.4096 | ± | 0.0467 |
|
60 |
+
| winogrande | Yaml | none | 0 | acc | 0.7443 | ± | 0.0123 |
|
61 |
|
62 |
+
#### yanolja/KoSOLAR-10.7B-v0.1
|
63 |
|
64 |
+
| Groups | Version | Filter | n-shot | Metric | Value | | Stderr |
|
65 |
+
|-------------|---------|-----------|--------|-------------|--------|-----|--------|
|
66 |
+
| kmmlu | N/A | none | 0 | acc | 0.2946 | ± | 0.0496 |
|
67 |
+
| | | none | 0 | acc_norm | 0.2946 | ± | 0.0496 |
|
68 |
+
| gsm8k | Yaml | get-answer| 5 | exact_match | 0.5527 | ± | 0.0137 |
|
69 |
+
| hellaswag | Yaml | none | 0 | acc | 0.6392 | ± | 0.0048 |
|
70 |
+
| mmlu | N/A | none | 0 | acc | 0.6303 | ± | 0.1411 |
|
71 |
+
| truthfulqa | N/A | none | 0 | acc | 0.3618 | ± | 0.0472 |
|
72 |
+
| winogrande | Yaml | none | 0 | acc | 0.7459 | ± | 0.0122 |
|
73 |
|
74 |
+
### Framework Versions
|
75 |
|
76 |
- Transformers 4.37.0.dev0
|
77 |
- Pytorch 2.1.2+cu121
|
78 |
- Datasets 2.16.0
|
79 |
+
- Tokenizers 0.15.0
|