MinistryofDigitalAffairs commited on
Commit
d30d9c4
·
verified ·
1 Parent(s): 4085b28

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +4 -2
README.md CHANGED
@@ -1,5 +1,7 @@
1
  ---
2
  license: apache-2.0
 
 
3
  ---
4
  <p align="center">
5
  <img src="https://pllum.org.pl/_nuxt/PLLuM_logo_RGB_color.DXNEc-VR.png">
@@ -50,7 +52,7 @@ Below is a summary of the main PLLuM models, including their licenses, bases, an
50
 
51
  ### Model Development
52
  - **Pretraining**: All models were pretrained or continued-pretrained on large-scale Polish corpora (up to 150B tokens) plus a range of additional Slavic/Baltic and English texts.
53
- - **Instruction Fine-Tuning**: We refined the models on manually curated Polish “organic instructions,” converted instructions from premium Polish corpora, and synthetic instructions generated by strong LLMs.
54
  - **Alignment and Preference Learning**: Manually annotated preference data taught the models to produce safer, balanced, and contextually appropriate responses, even in adversarial or sensitive cases.
55
  - **Domain-Specific Adaptations**: Specialized RAG-based (Retrieval Augmented Generation) models were developed for tasks like public administration, demonstrating strong performance in complex information retrieval and question answering.
56
 
@@ -229,4 +231,4 @@ We welcome feedback, collaboration, and further exploration of PLLuM models!
229
  Project financed by the Minister of Digital Affairs under the targeted subsidy No. 1/WI/DBiI/2023: *“Responsible development of the open large language model PLLuM (Polish Large Language Model) to support breakthrough technologies in the public and economic sector, including an open, Polish-language intelligent assistant for petitioners.”*
230
 
231
  **Funding Amount:** 14,504,392.00 PLN
232
- **Contract Signing Date:** 2024-01-22
 
1
  ---
2
  license: apache-2.0
3
+ language:
4
+ - pl
5
  ---
6
  <p align="center">
7
  <img src="https://pllum.org.pl/_nuxt/PLLuM_logo_RGB_color.DXNEc-VR.png">
 
52
 
53
  ### Model Development
54
  - **Pretraining**: All models were pretrained or continued-pretrained on large-scale Polish corpora (up to 150B tokens) plus a range of additional Slavic/Baltic and English texts.
55
+ - **Instruction Fine-Tuning**: We refined the models on manually curated Polish “organic instructions (approx. 40k), converted instructions from premium Polish corpora (approx. 50k), and synthetic instructions generated by strong LLMs (approx. 10k).
56
  - **Alignment and Preference Learning**: Manually annotated preference data taught the models to produce safer, balanced, and contextually appropriate responses, even in adversarial or sensitive cases.
57
  - **Domain-Specific Adaptations**: Specialized RAG-based (Retrieval Augmented Generation) models were developed for tasks like public administration, demonstrating strong performance in complex information retrieval and question answering.
58
 
 
231
  Project financed by the Minister of Digital Affairs under the targeted subsidy No. 1/WI/DBiI/2023: *“Responsible development of the open large language model PLLuM (Polish Large Language Model) to support breakthrough technologies in the public and economic sector, including an open, Polish-language intelligent assistant for petitioners.”*
232
 
233
  **Funding Amount:** 14,504,392.00 PLN
234
+ **Contract Signing Date:** 2024-01-22