Update README.md
Browse files
README.md
CHANGED
@@ -1,5 +1,7 @@
|
|
1 |
---
|
2 |
license: apache-2.0
|
|
|
|
|
3 |
---
|
4 |
<p align="center">
|
5 |
<img src="https://pllum.org.pl/_nuxt/PLLuM_logo_RGB_color.DXNEc-VR.png">
|
@@ -50,7 +52,7 @@ Below is a summary of the main PLLuM models, including their licenses, bases, an
|
|
50 |
|
51 |
### Model Development
|
52 |
- **Pretraining**: All models were pretrained or continued-pretrained on large-scale Polish corpora (up to 150B tokens) plus a range of additional Slavic/Baltic and English texts.
|
53 |
-
- **Instruction Fine-Tuning**: We refined the models on manually curated Polish “organic instructions
|
54 |
- **Alignment and Preference Learning**: Manually annotated preference data taught the models to produce safer, balanced, and contextually appropriate responses, even in adversarial or sensitive cases.
|
55 |
- **Domain-Specific Adaptations**: Specialized RAG-based (Retrieval Augmented Generation) models were developed for tasks like public administration, demonstrating strong performance in complex information retrieval and question answering.
|
56 |
|
@@ -229,4 +231,4 @@ We welcome feedback, collaboration, and further exploration of PLLuM models!
|
|
229 |
Project financed by the Minister of Digital Affairs under the targeted subsidy No. 1/WI/DBiI/2023: *“Responsible development of the open large language model PLLuM (Polish Large Language Model) to support breakthrough technologies in the public and economic sector, including an open, Polish-language intelligent assistant for petitioners.”*
|
230 |
|
231 |
**Funding Amount:** 14,504,392.00 PLN
|
232 |
-
**Contract Signing Date:** 2024-01-22
|
|
|
1 |
---
|
2 |
license: apache-2.0
|
3 |
+
language:
|
4 |
+
- pl
|
5 |
---
|
6 |
<p align="center">
|
7 |
<img src="https://pllum.org.pl/_nuxt/PLLuM_logo_RGB_color.DXNEc-VR.png">
|
|
|
52 |
|
53 |
### Model Development
|
54 |
- **Pretraining**: All models were pretrained or continued-pretrained on large-scale Polish corpora (up to 150B tokens) plus a range of additional Slavic/Baltic and English texts.
|
55 |
+
- **Instruction Fine-Tuning**: We refined the models on manually curated Polish “organic instructions” (approx. 40k), converted instructions from premium Polish corpora (approx. 50k), and synthetic instructions generated by strong LLMs (approx. 10k).
|
56 |
- **Alignment and Preference Learning**: Manually annotated preference data taught the models to produce safer, balanced, and contextually appropriate responses, even in adversarial or sensitive cases.
|
57 |
- **Domain-Specific Adaptations**: Specialized RAG-based (Retrieval Augmented Generation) models were developed for tasks like public administration, demonstrating strong performance in complex information retrieval and question answering.
|
58 |
|
|
|
231 |
Project financed by the Minister of Digital Affairs under the targeted subsidy No. 1/WI/DBiI/2023: *“Responsible development of the open large language model PLLuM (Polish Large Language Model) to support breakthrough technologies in the public and economic sector, including an open, Polish-language intelligent assistant for petitioners.”*
|
232 |
|
233 |
**Funding Amount:** 14,504,392.00 PLN
|
234 |
+
**Contract Signing Date:** 2024-01-22
|