cjvt
/

dvres commited on
Commit
3df7805
1 Parent(s): 842ae42

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +11 -9
README.md CHANGED
@@ -11,25 +11,27 @@ library_name: transformers
11
 
12
  # OPT_GaMS 1B
13
 
14
- We proudly introduce the familly of GaMS (Generative Model for Slovene) models. The 1B version is based on [Facebook's OPT model](https://huggingface.co/facebook/opt-1.3b) and is adapted for Slovene. OPT_GaMS models use original OPT tokenizer.
15
 
16
  ## Acknowledgment
17
 
18
- The model was developed within the [PoVeJMo](https://povejmo.si) research program (Adaptive Natural Language Processing with Large Language Models}; Prilagodljiva obdelava naravnega jezika s pomočjo velikih jezikovnih modelov), particularly within the research project titled SloLLaMai -- Open-access computationally efficient models for Slovenian, funded within the Recovery and Resilience Plan (NOO; Načrt za okrevanje in odpornost) by the Slovenian Research and Innovation Agency (ARIS) and NextGenerationEU. The authors also acknowledge the financial support from the Slovenian Research and Innovation Agency (research core funding No. P6-0411 -- Language Resources and Technologies for Slovene).
 
 
19
 
20
  ## Basic information
21
 
22
- - **Developed by:** team of researchers at University of Ljubljana, Faculty for Computer and Information Science and XLAB.doo. Team members: Domen Vreš, Martin Božič, Aljaž Potočnik, Tomaž Martinčič and Marko Robnik-Šikonja.
23
- - **Language(s:)** Slovene (primary), English, Croatian, Bosnian and Serbian (secondary)
24
  - **License:** Apache 2.0
25
  - **Repository:** https://github.com/SloLama/NeMo
26
  - **Paper:** https://www.sdjt.si/wp/wp-content/uploads/2024/09/JT-DH-2024_Vres_Bozic_Potocnik_Martincic_Robnik.pdf
27
 
28
  ## Intended usage
29
 
30
- This version of the model is quite small and lacks instruction and safety tuning. Hence, using it as a general purpose model is **STRONGLY DISCOURAGED!!!** The model might also contain certain biases. We do not recommend usage of this model in any other language than Slovene.
31
 
32
- The model can be efficiently tuned for specific use cases as suggested by promising results of fine-tuned models on SuperGLUE and SI-NLI benchmarks.
33
 
34
  ## How to Get Started with the Model
35
 
@@ -72,7 +74,7 @@ for seq in sequences:
72
  The model was additionally pretrained on the following Slovene, English, and Croatian-Bosnian-Serbian (CBS) corpora:
73
  | Corpus | Language | # Tokens | Percentage |
74
  | :----- | :------- | :------: | :--------: |
75
- | Metafida | Slovene | 6.59 B | 13.89 % |
76
  | KAS | Slovene | 3.61 B | 7.62 % |
77
  | Trendi | Slovene | 1.4 B | 2.96 % |
78
  | mC4 | Slovene | 5.5 B | 11.6 % |
@@ -88,11 +90,11 @@ The total size of additional training data is **47.44 B** tokens.
88
 
89
  ### Training Procedure
90
 
91
- The model was trained using NeMo framework on Slovene HPC Vega, utilizing 64 A100 GPUs at once. Training took approximately 16 hours. The model was trained with batch size 1024 (2 million tokens) using Adam optimizer and cosine learning rate scheduler with 1000 warmup and constant steps.
92
 
93
  ## Evaluation
94
 
95
- The model was evaluated using [Slovene SuperGLUE](https://slobench.cjvt.si/leaderboard/view/3) and [SI-NLI](https://slobench.cjvt.si/leaderboard/view/9) tasks on [SloBench](https://slobench.cjvt.si). Additionally, the models was evaluated on imporved version of Slovenian-LLM-eval introduced by Aleksa Gordić. All decoder-type models were evaluated using few-shot prompts and were not finetuned on the benchmark (except for the versions with finetuned in the name).
96
 
97
  ### SuperGLUE results
98
 
 
11
 
12
  # OPT_GaMS 1B
13
 
14
+ We proudly present the familly of GaMS (Generative Model for Slovene) models. The 1B version is based on [Facebook's OPT model](https://huggingface.co/facebook/opt-1.3b) and is adapted for Slovene. OPT_GaMS models use original OPT tokenizer.
15
 
16
  ## Acknowledgment
17
 
18
+ The model was developed within the [PoVeJMo](https://www.cjvt.si/povejmo/en/project/) research program (Adaptive Natural Language Processing with Large Language Models), particularly within the research project titled SloLLaMai -- Open-access computationally efficient models for Slovenian. The program is funded within the Recovery and Resilience Plan by the Slovenian Research and Innovation Agency (ARIS) and NextGenerationEU. The authors also acknowledge the financial support from the Slovenian Research and Innovation Agency (research core funding No. P6-0411 -- Language Resources and Technologies for Slovene).
19
+
20
+ We thank everyone who worked on data collection and preparation, enabling us to train our model. Special thanks go to Nikola Ljubešić, Tjaša Arčon, Jaka Čibej, Simon Krek, Tomaž Erjavec and Iztok Kosem.
21
 
22
  ## Basic information
23
 
24
+ - **Developed by:** team of researchers at the University of Ljubljana, Faculty for Computer and Information Science and XLAB.doo. Team members: Domen Vreš, Martin Božič, Aljaž Potočnik, Tomaž Martinčič and Marko Robnik-Šikonja.
25
+ - **Languages:** Slovene (primary), English, Croatian, Bosnian and Serbian (secondary)
26
  - **License:** Apache 2.0
27
  - **Repository:** https://github.com/SloLama/NeMo
28
  - **Paper:** https://www.sdjt.si/wp/wp-content/uploads/2024/09/JT-DH-2024_Vres_Bozic_Potocnik_Martincic_Robnik.pdf
29
 
30
  ## Intended usage
31
 
32
+ This version of the model is quite small and lacks instruction and safety tuning. Hence, using it as a general-purpose model is **STRONGLY DISCOURAGED!** The model might also contain certain biases. We do not recommend the usage of this model in any other language than Slovene.
33
 
34
+ The model can be efficiently tuned for specific use cases, as suggested by promising results of fine-tuned models on SuperGLUE and SI-NLI benchmarks.
35
 
36
  ## How to Get Started with the Model
37
 
 
74
  The model was additionally pretrained on the following Slovene, English, and Croatian-Bosnian-Serbian (CBS) corpora:
75
  | Corpus | Language | # Tokens | Percentage |
76
  | :----- | :------- | :------: | :--------: |
77
+ | MetaFida | Slovene | 6.59 B | 13.89 % |
78
  | KAS | Slovene | 3.61 B | 7.62 % |
79
  | Trendi | Slovene | 1.4 B | 2.96 % |
80
  | mC4 | Slovene | 5.5 B | 11.6 % |
 
90
 
91
  ### Training Procedure
92
 
93
+ The model was trained using the NeMo framework on Slovene HPC Vega, utilizing 64 A100 GPUs simultaneously. Training took approximately 16 hours. The model was trained with batch size 1024 (2 million tokens) using Adam optimizer and cosine learning rate scheduler with 1000 warmup and constant steps.
94
 
95
  ## Evaluation
96
 
97
+ The models were evaluated using [Slovene SuperGLUE](https://slobench.cjvt.si/leaderboard/view/3) and [SI-NLI](https://slobench.cjvt.si/leaderboard/view/9) tasks on [SloBench](https://slobench.cjvt.si). Additionally, the models were evaluated on an improved version of the Slovenian-LLM-eval introduced by Aleksa Gordić. All decoder-type models were evaluated using few-shot prompts and were not finetuned on the benchmark (except for the versions with finetuned in the name).
98
 
99
  ### SuperGLUE results
100