boun-tabi-LMG
/

TURNA

Text2Text Generation

Transformers

Safetensors

Turkish

text-generation-inference

Model card Files Files and versions Community

onurgu commited on Jan 24, 2024

Commit

8075d90

•

1 Parent(s): d809194

Update README.md

Browse files

Files changed (1) hide show

README.md +30 -4

README.md CHANGED Viewed

@@ -4,6 +4,9 @@ language:
 - tr
 library_name: transformers
 pipeline_tag: text2text-generation
 ---
@@ -19,6 +22,15 @@ The model is shared with the public to be used solely for non-commercial academi
 ## Model Details
 ### Model Description
 <!-- Provide a longer summary of what this model is. -->
@@ -30,7 +42,7 @@ The model is shared with the public to be used solely for non-commercial academi
 - **Language(s) (NLP):** Turkish
 - **License:** The model is shared with the public to be used solely for non-commercial academic research purposes.
-### Model Sources [optional]
 <!-- Provide the basic links for the model. -->
@@ -51,9 +63,9 @@ This model can be used for research purposes. You give some text and this model
 <!-- This section is for the model use when fine-tuned for a task, or when plugged into a larger ecosystem/app -->
-This model can be finetuned using [our library](https://github.com/boun-tabi-LMG/turkish-lm-tuner) to solve your own task involving Turkish language.
-This model can be further trained for behaving more helpful, less harmful and better for dialog use cases.
 ### Out-of-Scope Use
@@ -82,14 +94,28 @@ We refer to the Flan-T5's [official model card](https://arxiv.org/pdf/2210.11416
 ## How to Get Started with the Model
-You can find the technical usage guidance at our library's Github [page](https://github.com/boun-tabi-LMG/turkish-lm-tuner).
 ## Training Details
 Refer to the paper for more information.
 ## Evaluation
 Refer to the paper for more information.
 ## Environmental Impact

 - tr
 library_name: transformers
 pipeline_tag: text2text-generation
+datasets:
+- batubayk/TR-News
+- mlsum
 ---
 ## Model Details
+- 36 encoder and decoder layers
+- 16 attention heads
+- Token embeddings are 1024 dimensional
+- The multi-layer perceptron layers have 2816 hidden dimensions and employ Gated GeLu activations
+- The parameters of the input and classification layers are not shared
+- 1.1B parameters
+- used a unigram subword tokenizer trained on 10GB of text that consists of random subsets of OSCAR, OPUS, and Wikipedia
+- Vocabulary size: 32000 tokens + 128 special tokens
 ### Model Description
 <!-- Provide a longer summary of what this model is. -->
 - **Language(s) (NLP):** Turkish
 - **License:** The model is shared with the public to be used solely for non-commercial academic research purposes.
+### Model Sources
 <!-- Provide the basic links for the model. -->
 <!-- This section is for the model use when fine-tuned for a task, or when plugged into a larger ecosystem/app -->
+This model can be finetuned using [our library](https://github.com/boun-tabi-LMG/turkish-lm-tuner) to solve your custom task involving Turkish language.
+This model can be further trained to behave more helpful, less harmful and better for dialog use cases.
 ### Out-of-Scope Use
 ## How to Get Started with the Model
+You can find the technical guidance at our library's Github [page](https://github.com/boun-tabi-LMG/turkish-lm-tuner).
 ## Training Details
+- The pretraining was performed with Mixture-of-Denoisers (MoD)
+- This version of the model is trained for 1740000 steps
+- Batch size: 48
+- Input and output lengths: 512
+- Effectively exposed to 42.7B tokens
 Refer to the paper for more information.
 ## Evaluation
+We didn't yet evaluate the model for biases in any way.
+We only performed finetuning for several understanding and generation tasks:
+- Paraphrasing: TAT and OST [source](https://aclanthology.org/2022.icnlsp-1.14.pdf)
+- Summarization: [TRNews](https://dl.acm.org/doi/10.1007/s10579-021-09568-y) and [MLSUM](https://arxiv.org/pdf/2004.14900v1.pdf)
 Refer to the paper for more information.
 ## Environmental Impact