chrisociepa commited on
Commit
9f72096
1 Parent(s): d2ae294

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +26 -30
README.md CHANGED
@@ -9,7 +9,7 @@ inference:
9
  ---
10
 
11
  <p align="center">
12
- <img src="https://huggingface.co/speakleash/Bielik-7B-v0.1/raw/main/speakleash_cyfronet.png">
13
  </p>
14
 
15
  # Bielik-11B-v2
@@ -36,7 +36,7 @@ Bielik-11B-v2 has been trained with [Megatron-LM](https://github.com/NVIDIA/Mega
36
 
37
  The model training was conducted on the Helios Supercomputer at the ACK Cyfronet AGH, utilizing 256 NVidia GH200 cards.
38
 
39
- The training dataset was composed of Polish texts collected and made available through the [SpeakLeash](https://speakleash.org/) project as well as a part of the [SlimPajama](https://huggingface.co/datasets/cerebras/SlimPajama-627B). We used 200 billion tokens for two epochs of training.
40
 
41
  ### Model description:
42
 
@@ -44,7 +44,7 @@ The training dataset was composed of Polish texts collected and made available t
44
  * **Language:** Polish
45
  * **Model type:** causal decoder-only
46
  * **Initialized from:** [Mistral-7B-v0.2](https://models.mistralcdn.com/mistral-7b-v0-2/mistral-7B-v0.2.tar)
47
- * **License:** Apache 2.0 (commercial use allowed)
48
  * **Model ref:** speakleash:45b6efdb701991181a05968fc53d2a8e
49
 
50
  ### Quality evaluation
@@ -100,29 +100,28 @@ The benchmark evaluates models in NLP tasks like sentiment analysis, categorizat
100
 
101
  | Model | Parameters (B) | Average |
102
  |------------------------|------------|---------|
103
- | Qwen2-72B | 72 | 65.76 |
104
- | Meta-Llama-3-70B | 70 | 60.87 |
105
- | Meta-Llama-3.1-70B | 70 | 60.39 |
106
- | Mixtral-8x22B-v0.1 | 141 | 59.95 |
107
- | Qwen1.5-72B | 72 | 59.94 |
108
- | Qwen1.5-32B | 32 | 57.34 |
109
- | **Bielik-11B-v2** | **11** | **56.61** |
110
- | Qwen2-7B | 7 | 48.75 |
111
- | Mistral-Nemo-Base-2407 | 12 | 46.15 |
112
- | SOLAR-10.7B-v1.0 | 10.7 | 46.04 |
113
- | internlm2-20b | 20 | 45.98 |
114
- | Meta-Llama-3.1-8B | 8 | 42.79 |
115
- | Meta-Llama-3-8B | 8 | 42.40 |
116
- | Mistral-7B-v0.2 | 7 | 37.20 |
117
- | Bielik-7B-v0.1 | 7 | 33.78 |
118
- | Qra-13b | 13 | 33.71 |
119
- | Qra-7b | 7 | 16.09 |
120
-
121
- The results from the Open PL LLM Leaderboard show that the Bielik-11B-v2 model, with 11 billion parameters, achieved an average score of 56.61. This makes it the best performing model among those under 20B parameters, outperforming the second-best model in this category by an impressive 8 percentage points. This significant lead not only places it ahead of its predecessor, the Bielik-7B-v0.1 (which scored 33.78), but also demonstrates its superiority over other larger models. The substantial improvement highlights the remarkable advancements and optimizations made in this newer version.
122
-
123
- Other Polish models listed include Qra-13b and Qra-7b, scoring 33.71 and 16.09 respectively, indicating that Bielik-11B-v2 outperforms these models by a considerable margin.
124
-
125
- Additionally, the Bielik-11B-v2 was initialized from the weights of Mistral-7B-v0.2, which itself scored 37.20, further demonstrating the effective enhancements incorporated into the Bielik-11B-v2 model.
126
 
127
  ### Open LLM Leaderboard
128
 
@@ -151,9 +150,6 @@ Bielik-11B-v2 is not intended for deployment without fine-tuning. It should not
151
 
152
  Bielik-11B-v2 can produce factually incorrect output, and should not be relied on to produce factually accurate data. Bielik-11B-v2 was trained on various public datasets. While great efforts have been taken to clear the training data, it is possible that this model can generate lewd, false, biased or otherwise offensive outputs.
153
 
154
- ## License
155
-
156
- The model is licensed under Apache 2.0, which allows for commercial use.
157
 
158
  ## Citation
159
  Please cite this model using the following format:
@@ -169,7 +165,7 @@ Please cite this model using the following format:
169
  }
170
  @unpublished{Bielik11Bv2a,
171
  author = {Ociepa, Krzysztof and Flis, Łukasz and Kinas, Remigiusz and Gwoździej, Adrian and Wróbel, Krzysztof},
172
- title = {Bielik: A Family of Large Language Models for the Polish Language Development, Insights, and Evaluation},
173
  year = {2024},
174
  }
175
  ```
 
9
  ---
10
 
11
  <p align="center">
12
+ <img src="https://huggingface.co/speakleash/Bielik-11B-v2/raw/main/speakleash_cyfronet.png">
13
  </p>
14
 
15
  # Bielik-11B-v2
 
36
 
37
  The model training was conducted on the Helios Supercomputer at the ACK Cyfronet AGH, utilizing 256 NVidia GH200 cards.
38
 
39
+ The training dataset was composed of Polish texts collected and made available through the [SpeakLeash](https://speakleash.org/) project, as well as a subset of CommonCrawl data. We used 200 billion tokens (over 700 GB of plain text) for two epochs of training.
40
 
41
  ### Model description:
42
 
 
44
  * **Language:** Polish
45
  * **Model type:** causal decoder-only
46
  * **Initialized from:** [Mistral-7B-v0.2](https://models.mistralcdn.com/mistral-7b-v0-2/mistral-7B-v0.2.tar)
47
+ * **License:** Apache 2.0
48
  * **Model ref:** speakleash:45b6efdb701991181a05968fc53d2a8e
49
 
50
  ### Quality evaluation
 
100
 
101
  | Model | Parameters (B) | Average |
102
  |------------------------|------------|---------|
103
+ | Meta-Llama-3-70B | 70 | 62.07 |
104
+ | Qwen1.5-72B | 72 | 61.11 |
105
+ | Meta-Llama-3.1-70B | 70 | 60.87 |
106
+ | Mixtral-8x22B-v0.1 | 141 | 60.75 |
107
+ | Qwen1.5-32B | 32 | 58.71 |
108
+ | **Bielik-11B-v2** | **11** | **58.14** |
109
+ | Qwen2-7B | 7 | 49.39 |
110
+ | SOLAR-10.7B-v1.0 | 10.7 | 47.54 |
111
+ | Mistral-Nemo-Base-2407 | 12 | 47.28 |
112
+ | internlm2-20b | 20 | 47.15 |
113
+ | Meta-Llama-3.1-8B | 8 | 43.77 |
114
+ | Meta-Llama-3-8B | 8 | 43.30 |
115
+ | Mistral-7B-v0.2 | 7 | 38.81 |
116
+ | Bielik-7B-v0.1 | 7 | 34.34 |
117
+ | Qra-13b | 13 | 33.90 |
118
+ | Qra-7b | 7 | 16.60 |
119
+
120
+ The results from the Open PL LLM Leaderboard show that the Bielik-11B-v2 model, with 11 billion parameters, achieved an average score of 58.14. This makes it the best performing model among those under 20B parameters, outperforming the second-best model in this category by an impressive 8.75 percentage points. This significant lead not only places it ahead of its predecessor, the Bielik-7B-v0.1 (which scored 34.34), but also demonstrates its superiority over other larger models. The substantial improvement highlights the remarkable advancements and optimizations made in this newer version.
121
+
122
+ Other Polish models listed include Qra-13b and Qra-7b, scoring 33.90 and 16.60 respectively, indicating that Bielik-11B-v2 outperforms these models by a considerable margin.
123
+
124
+ Additionally, the Bielik-11B-v2 was initialized from the weights of Mistral-7B-v0.2, which itself scored 38.81, further demonstrating the effective enhancements incorporated into the Bielik-11B-v2 model.
 
125
 
126
  ### Open LLM Leaderboard
127
 
 
150
 
151
  Bielik-11B-v2 can produce factually incorrect output, and should not be relied on to produce factually accurate data. Bielik-11B-v2 was trained on various public datasets. While great efforts have been taken to clear the training data, it is possible that this model can generate lewd, false, biased or otherwise offensive outputs.
152
 
 
 
 
153
 
154
  ## Citation
155
  Please cite this model using the following format:
 
165
  }
166
  @unpublished{Bielik11Bv2a,
167
  author = {Ociepa, Krzysztof and Flis, Łukasz and Kinas, Remigiusz and Gwoździej, Adrian and Wróbel, Krzysztof},
168
+ title = {Bielik: A Family of Large Language Models for the Polish Language - Development, Insights, and Evaluation},
169
  year = {2024},
170
  }
171
  ```