File size: 9,470 Bytes
139bc19 0305294 139bc19 0305294 c8306dc 139bc19 804694b 0305294 139bc19 496917f 139bc19 f015740 139bc19 f015740 139bc19 0305294 139bc19 0305294 139bc19 f015740 139bc19 0305294 139bc19 0305294 139bc19 0305294 139bc19 0305294 139bc19 0305294 2beaa28 139bc19 2beaa28 c4be33d 0305294 139bc19 0305294 139bc19 0305294 139bc19 0305294 139bc19 0305294 8b38e9d 139bc19 0305294 139bc19 0305294 139bc19 0305294 8b38e9d 139bc19 0305294 139bc19 0305294 139bc19 0305294 139bc19 0305294 139bc19 0305294 139bc19 0305294 139bc19 47403e4 139bc19 0305294 139bc19 0305294 139bc19 0305294 139bc19 0305294 |
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 |
---
language:
- ko
- en
pipeline_tag: text-generation
inference: false
tags:
- solar
- mistral
- pytorch
- solar-ko
library_name: transformers
license: apache-2.0
base_model: upstage/SOLAR-10.7B-v1.0
---
<img src="https://cdn-uploads.huggingface.co/production/uploads/5e56829137cb5b49818287ea/WuiaS45EAWDurGTOtjR_d.png" style="max-width:250px;margin:0 auto;" />
**Update Log**
- 2024.07.01: Released Solar-Ko-Recovery & Uploaded Benchmark scores
- 2024.05.16: Preview Released Solar-Ko-Recovery
# **Solar-Ko-Recovery-11B** 🌟❤️🩹
Solar-Ko-Recovery-11B aimed to recover Solar's capability on Korean with re-arrange of Embeddings and LM head, featuring an expanded vocabulary and the inclusion of a Korean+English corpus for enhanced representation.
## Model Details
**Model Developers:** Junbum Lee (Beomi)
**Variations:** Solar-Ko-Recovery is available with one parameter sizes — 11B(10.99B🤣).
**Input:** The model accepts only text input.
**Output:** The model produces text output exclusively.
**Model Architecture:**
Solar-Ko-Recovery is an auto-regressive language model that leverages an optimized transformer architecture derived from Llama-2.
| |Training Data|Parameters|Content Length|GQA|Tokens|Learning Rate|
|---|---|---|---|---|---|---|
|Solar-Ko-Recovery|*A curated mix of Korean+English Corpora*|11B(10.99B)|4k|O|>100B*|5e<sup>-5</sup>|
> NOTE: 2-step training processed
>
> 1) Only Embedding layer and LM Head layer are trained
> 2) Full params trained
**Vocab Expansion**
Vocab expansion is conducted on edited [upstage/solar-1-mini-tokenizer](https://huggingface.co/upstage/solar-1-mini-tokenizer), which is superset of Solar tokenizer.
| Model Name | Vocabulary Size | Description |
| --- | --- | --- |
| Original Solar | 32000 | Sentencepiece BPE |
| **solar-1-mini-tokenizer** | 64000 | Sentencepiece BPE. Added Ko/JP vocabs |
**Tokenizing "안녕하세요, 오늘은 날씨가 좋네요."**
- SOLAR-10.7B: 26 tokens
- Solar-Ko-Recovery: 7 tokens
| Model | Tokens |
| --- | --- |
| SOLAR-10.7B | `['▁', '안', '<0xEB>', '<0x85>', '<0x95>', '하', '세', '요', ',', '▁', '오', '<0xEB>', '<0x8A>', '<0x98>', '은', '▁', '날', '<0xEC>', '<0x94>', '<0xA8>', '가', '▁', '좋', '네', '요', '.']` |
| Solar-Ko-Recovery | `['▁안녕하세요', ',', '▁오늘은', '▁날씨가', '▁좋', '네요', '.']` |
**Tokenizing "Meet 10.7B Solar: Elevating Performance with Upstage Depth UP Scaling!"**
- SOLAR-10.7B: 22 tokens
- Solar-Ko-Recovery: 22 tokens
| Model | Tokens |
| --- | --- |
| SOLAR-10.7B | `['▁Meet', '▁', '1', '0', '.', '7', 'B', '▁Solar', ':', '▁E', 'lev', 'ating', '▁Performance', '▁with', '▁Up', 'stage', '▁Dep', 'th', '▁UP', '▁Scal', 'ing', '!']` |
| Solar-Ko-Recovery | `['▁Meet', '▁', '1', '0', '.', '7', 'B', '▁Solar', ':', '▁E', 'lev', 'ating', '▁Performance', '▁with', '▁Up', 'stage', '▁Dep', 'th', '▁UP', '▁Scal', 'ing', '!']` |
# LICENSE
Apache 2.0
# **Model Benchmark**
## LM Eval Harness - Korean
- Used EleutherAI's [lm-evaluation-harness](https://github.com/EleutherAI/lm-evaluation-harness)
- 5-shot scores
| Tasks | Metric | Value | | Stderr |
|----------------------------------------------------------|-----------|--------:|---|--------:|
|haerae |acc_norm | 0.7874 |± | 0.0118 |
| - haerae_general_knowledge |acc | 0.5000 |± | 0.0378 |
| - haerae_history |acc | 0.8723 |± | 0.0244 |
| - haerae_loan_word |acc | 0.8402 |± | 0.0283 |
| - haerae_rare_word |acc | 0.8346 |± | 0.0185 |
| - haerae_standard_nomenclature |acc | 0.8301 |± | 0.0305 |
|kmmlu_direct |exact_match| 0.4205 |± | 0.0026 |
| - kmmlu_direct_accounting |exact_match| 0.3700 |± | 0.0485 |
| - kmmlu_direct_agricultural_sciences |exact_match| 0.3140 |± | 0.0147 |
| - kmmlu_direct_aviation_engineering_and_maintenance |exact_match| 0.3870 |± | 0.0154 |
| - kmmlu_direct_biology |exact_match| 0.3510 |± | 0.0151 |
| - kmmlu_direct_chemical_engineering |exact_match| 0.3910 |± | 0.0154 |
| - kmmlu_direct_chemistry |exact_match| 0.4000 |± | 0.0200 |
| - kmmlu_direct_civil_engineering |exact_match| 0.4010 |± | 0.0155 |
| - kmmlu_direct_computer_science |exact_match| 0.6520 |± | 0.0151 |
| - kmmlu_direct_construction |exact_match| 0.3080 |± | 0.0146 |
| - kmmlu_direct_criminal_law |exact_match| 0.3100 |± | 0.0328 |
| - kmmlu_direct_ecology |exact_match| 0.4660 |± | 0.0158 |
| - kmmlu_direct_economics |exact_match| 0.5385 |± | 0.0439 |
| - kmmlu_direct_education |exact_match| 0.6200 |± | 0.0488 |
| - kmmlu_direct_electrical_engineering |exact_match| 0.3000 |± | 0.0145 |
| - kmmlu_direct_electronics_engineering |exact_match| 0.4740 |± | 0.0158 |
| - kmmlu_direct_energy_management |exact_match| 0.3560 |± | 0.0151 |
| - kmmlu_direct_environmental_science |exact_match| 0.2980 |± | 0.0145 |
| - kmmlu_direct_fashion |exact_match| 0.4470 |± | 0.0157 |
| - kmmlu_direct_food_processing |exact_match| 0.3690 |± | 0.0153 |
| - kmmlu_direct_gas_technology_and_engineering |exact_match| 0.3000 |± | 0.0145 |
| - kmmlu_direct_geomatics |exact_match| 0.3820 |± | 0.0154 |
| - kmmlu_direct_health |exact_match| 0.5700 |± | 0.0498 |
| - kmmlu_direct_industrial_engineer |exact_match| 0.3830 |± | 0.0154 |
| - kmmlu_direct_information_technology |exact_match| 0.6090 |± | 0.0154 |
| - kmmlu_direct_interior_architecture_and_design |exact_match| 0.5440 |± | 0.0158 |
| - kmmlu_direct_korean_history |exact_match| 0.3800 |± | 0.0488 |
| - kmmlu_direct_law |exact_match| 0.4670 |± | 0.0158 |
| - kmmlu_direct_machine_design_and_manufacturing |exact_match| 0.3960 |± | 0.0155 |
| - kmmlu_direct_management |exact_match| 0.5030 |± | 0.0158 |
| - kmmlu_direct_maritime_engineering |exact_match| 0.4283 |± | 0.0202 |
| - kmmlu_direct_marketing |exact_match| 0.7460 |± | 0.0138 |
| - kmmlu_direct_materials_engineering |exact_match| 0.4020 |± | 0.0155 |
| - kmmlu_direct_math |exact_match| 0.2867 |± | 0.0262 |
| - kmmlu_direct_mechanical_engineering |exact_match| 0.3490 |± | 0.0151 |
| - kmmlu_direct_nondestructive_testing |exact_match| 0.3760 |± | 0.0153 |
| - kmmlu_direct_patent |exact_match| 0.3700 |± | 0.0485 |
| - kmmlu_direct_political_science_and_sociology |exact_match| 0.5300 |± | 0.0289 |
| - kmmlu_direct_psychology |exact_match| 0.4470 |± | 0.0157 |
| - kmmlu_direct_public_safety |exact_match| 0.3520 |± | 0.0151 |
| - kmmlu_direct_railway_and_automotive_engineering |exact_match| 0.3220 |± | 0.0148 |
| - kmmlu_direct_real_estate |exact_match| 0.4350 |± | 0.0351 |
| - kmmlu_direct_refrigerating_machinery |exact_match| 0.3240 |± | 0.0148 |
| - kmmlu_direct_social_welfare |exact_match| 0.4970 |± | 0.0158 |
| - kmmlu_direct_taxation |exact_match| 0.3800 |± | 0.0344 |
| - kmmlu_direct_telecommunications_and_wireless_technology|exact_match| 0.5480 |± | 0.0157 |
|kobest_boolq |acc | 0.9202 |± | 0.0072 |
| |f1 | 0.9202 |± |N/A |
|kobest_copa |acc | 0.8680 |± | 0.0107 |
| |f1 | 0.8678 |± |N/A |
|kobest_hellaswag |acc | 0.5560 |± | 0.0222 |
| |f1 | 0.5520 |± |N/A |
| |acc_norm | 0.6540 |± | 0.0213 |
|kobest_sentineg |acc | 0.9824 |± | 0.0066 |
| |f1 | 0.9824 |± |N/A |
## Citation
TBD
## Acknowledgements
- Training support was provided by the [TPU Research Cloud](https://sites.research.google/trc/) program. |