speakleash
/

Bielik-11B-v2

@@ -9,7 +9,7 @@ inference:
 ---
 <p align="center">
-  <img src="https://huggingface.co/speakleash/Bielik-7B-v0.1/raw/main/speakleash_cyfronet.png">
 </p>
 # Bielik-11B-v2
@@ -36,7 +36,7 @@ Bielik-11B-v2 has been trained with [Megatron-LM](https://github.com/NVIDIA/Mega
 The model training was conducted on the Helios Supercomputer at the ACK Cyfronet AGH, utilizing 256 NVidia GH200 cards.
-The training dataset was composed of Polish texts collected and made available through the [SpeakLeash](https://speakleash.org/) project as well as a part of the [SlimPajama](https://huggingface.co/datasets/cerebras/SlimPajama-627B). We used 200 billion tokens for two epochs of training.
 ### Model description:
@@ -44,7 +44,7 @@ The training dataset was composed of Polish texts collected and made available t
 * **Language:** Polish
 * **Model type:** causal decoder-only
 * **Initialized from:** [Mistral-7B-v0.2](https://models.mistralcdn.com/mistral-7b-v0-2/mistral-7B-v0.2.tar)
-* **License:** Apache 2.0 (commercial use allowed)
 * **Model ref:** speakleash:45b6efdb701991181a05968fc53d2a8e
 ### Quality evaluation
@@ -100,29 +100,28 @@ The benchmark evaluates models in NLP tasks like sentiment analysis, categorizat
 | Model                  | Parameters (B) | Average |
 |------------------------|------------|---------|
-| Qwen2-72B              | 72         | 65.76   |
-| Meta-Llama-3-70B       | 70         | 60.87   |
-| Meta-Llama-3.1-70B     | 70         | 60.39   |
-| Mixtral-8x22B-v0.1     | 141        | 59.95   |
-| Qwen1.5-72B            | 72         | 59.94   |
-| Qwen1.5-32B            | 32         | 57.34   |
-| **Bielik-11B-v2**      | **11**     | **56.61**   |
-| Qwen2-7B               | 7          | 48.75   |
-| Mistral-Nemo-Base-2407 | 12         | 46.15   |
-| SOLAR-10.7B-v1.0       | 10.7       | 46.04   |
-| internlm2-20b          | 20         | 45.98   |
-| Meta-Llama-3.1-8B      | 8          | 42.79   |
-| Meta-Llama-3-8B        | 8          | 42.40   |
-| Mistral-7B-v0.2        | 7          | 37.20   |
-| Bielik-7B-v0.1         | 7          | 33.78   |
-| Qra-13b                | 13         | 33.71   |
-| Qra-7b                 | 7          | 16.09   |
-The results from the Open PL LLM Leaderboard show that the Bielik-11B-v2 model, with 11 billion parameters, achieved an average score of 56.61. This makes it the best performing model among those under 20B parameters, outperforming the second-best model in this category by an impressive 8 percentage points. This significant lead not only places it ahead of its predecessor, the Bielik-7B-v0.1 (which scored 33.78), but also demonstrates its superiority over other larger models. The substantial improvement highlights the remarkable advancements and optimizations made in this newer version.
-Other Polish models listed include Qra-13b and Qra-7b, scoring 33.71 and 16.09 respectively, indicating that Bielik-11B-v2 outperforms these models by a considerable margin.
-Additionally, the Bielik-11B-v2 was initialized from the weights of Mistral-7B-v0.2, which itself scored 37.20, further demonstrating the effective enhancements incorporated into the Bielik-11B-v2 model.
 ### Open LLM Leaderboard
@@ -151,9 +150,6 @@ Bielik-11B-v2 is not intended for deployment without fine-tuning. It should not
 Bielik-11B-v2 can produce factually incorrect output, and should not be relied on to produce factually accurate data. Bielik-11B-v2 was trained on various public datasets. While great efforts have been taken to clear the training data, it is possible that this model can generate lewd, false, biased or otherwise offensive outputs.
-## License
-The model is licensed under Apache 2.0, which allows for commercial use.
 ## Citation
 Please cite this model using the following format:
@@ -169,7 +165,7 @@ Please cite this model using the following format:
 }
 @unpublished{Bielik11Bv2a,
   author = {Ociepa, Krzysztof and Flis, Łukasz and Kinas, Remigiusz and Gwoździej, Adrian and Wróbel, Krzysztof},
-  title  = {Bielik: A Family of Large Language Models for the Polish Language – Development, Insights, and Evaluation},
   year   = {2024},
 }
 ```

 ---
 <p align="center">
+  <img src="https://huggingface.co/speakleash/Bielik-11B-v2/raw/main/speakleash_cyfronet.png">
 </p>
 # Bielik-11B-v2
 The model training was conducted on the Helios Supercomputer at the ACK Cyfronet AGH, utilizing 256 NVidia GH200 cards.
+The training dataset was composed of Polish texts collected and made available through the [SpeakLeash](https://speakleash.org/) project, as well as a subset of CommonCrawl data. We used 200 billion tokens (over 700 GB of plain text) for two epochs of training.
 ### Model description:
 * **Language:** Polish
 * **Model type:** causal decoder-only
 * **Initialized from:** [Mistral-7B-v0.2](https://models.mistralcdn.com/mistral-7b-v0-2/mistral-7B-v0.2.tar)
+* **License:** Apache 2.0
 * **Model ref:** speakleash:45b6efdb701991181a05968fc53d2a8e
 ### Quality evaluation
 | Model                  | Parameters (B) | Average |
 |------------------------|------------|---------|
+| Meta-Llama-3-70B       | 70         | 62.07   |
+| Qwen1.5-72B            | 72         | 61.11   |
+| Meta-Llama-3.1-70B     | 70         | 60.87   |
+| Mixtral-8x22B-v0.1     | 141        | 60.75   |
+| Qwen1.5-32B            | 32         | 58.71   |
+| **Bielik-11B-v2**      | **11**     | **58.14**   |
+| Qwen2-7B               | 7          | 49.39   |
+| SOLAR-10.7B-v1.0       | 10.7       | 47.54   |
+| Mistral-Nemo-Base-2407 | 12         | 47.28   |
+| internlm2-20b          | 20         | 47.15   |
+| Meta-Llama-3.1-8B      | 8          | 43.77   |
+| Meta-Llama-3-8B        | 8          | 43.30   |
+| Mistral-7B-v0.2        | 7          | 38.81   |
+| Bielik-7B-v0.1         | 7          | 34.34   |
+| Qra-13b                | 13         | 33.90   |
+| Qra-7b                 | 7          | 16.60   |
+The results from the Open PL LLM Leaderboard show that the Bielik-11B-v2 model, with 11 billion parameters, achieved an average score of 58.14. This makes it the best performing model among those under 20B parameters, outperforming the second-best model in this category by an impressive 8.75 percentage points. This significant lead not only places it ahead of its predecessor, the Bielik-7B-v0.1 (which scored 34.34), but also demonstrates its superiority over other larger models. The substantial improvement highlights the remarkable advancements and optimizations made in this newer version.
+Other Polish models listed include Qra-13b and Qra-7b, scoring 33.90 and 16.60 respectively, indicating that Bielik-11B-v2 outperforms these models by a considerable margin.
+Additionally, the Bielik-11B-v2 was initialized from the weights of Mistral-7B-v0.2, which itself scored 38.81, further demonstrating the effective enhancements incorporated into the Bielik-11B-v2 model.
 ### Open LLM Leaderboard
 Bielik-11B-v2 can produce factually incorrect output, and should not be relied on to produce factually accurate data. Bielik-11B-v2 was trained on various public datasets. While great efforts have been taken to clear the training data, it is possible that this model can generate lewd, false, biased or otherwise offensive outputs.
 ## Citation
 Please cite this model using the following format:
 }
 @unpublished{Bielik11Bv2a,
   author = {Ociepa, Krzysztof and Flis, Łukasz and Kinas, Remigiusz and Gwoździej, Adrian and Wróbel, Krzysztof},
+  title  = {Bielik: A Family of Large Language Models for the Polish Language - Development, Insights, and Evaluation},
   year   = {2024},
 }
 ```