Update README.md
Browse files
README.md
CHANGED
@@ -4,6 +4,9 @@ language:
|
|
4 |
- tr
|
5 |
library_name: transformers
|
6 |
pipeline_tag: text2text-generation
|
|
|
|
|
|
|
7 |
---
|
8 |
|
9 |
|
@@ -19,6 +22,15 @@ The model is shared with the public to be used solely for non-commercial academi
|
|
19 |
|
20 |
## Model Details
|
21 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
22 |
### Model Description
|
23 |
|
24 |
<!-- Provide a longer summary of what this model is. -->
|
@@ -30,7 +42,7 @@ The model is shared with the public to be used solely for non-commercial academi
|
|
30 |
- **Language(s) (NLP):** Turkish
|
31 |
- **License:** The model is shared with the public to be used solely for non-commercial academic research purposes.
|
32 |
|
33 |
-
### Model Sources
|
34 |
|
35 |
<!-- Provide the basic links for the model. -->
|
36 |
|
@@ -51,9 +63,9 @@ This model can be used for research purposes. You give some text and this model
|
|
51 |
|
52 |
<!-- This section is for the model use when fine-tuned for a task, or when plugged into a larger ecosystem/app -->
|
53 |
|
54 |
-
This model can be finetuned using [our library](https://github.com/boun-tabi-LMG/turkish-lm-tuner) to solve your
|
55 |
|
56 |
-
This model can be further trained
|
57 |
|
58 |
### Out-of-Scope Use
|
59 |
|
@@ -82,14 +94,28 @@ We refer to the Flan-T5's [official model card](https://arxiv.org/pdf/2210.11416
|
|
82 |
|
83 |
## How to Get Started with the Model
|
84 |
|
85 |
-
You can find the technical
|
86 |
|
87 |
## Training Details
|
88 |
|
|
|
|
|
|
|
|
|
|
|
|
|
89 |
Refer to the paper for more information.
|
90 |
|
|
|
91 |
## Evaluation
|
92 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
93 |
Refer to the paper for more information.
|
94 |
|
95 |
## Environmental Impact
|
|
|
4 |
- tr
|
5 |
library_name: transformers
|
6 |
pipeline_tag: text2text-generation
|
7 |
+
datasets:
|
8 |
+
- batubayk/TR-News
|
9 |
+
- mlsum
|
10 |
---
|
11 |
|
12 |
|
|
|
22 |
|
23 |
## Model Details
|
24 |
|
25 |
+
- 36 encoder and decoder layers
|
26 |
+
- 16 attention heads
|
27 |
+
- Token embeddings are 1024 dimensional
|
28 |
+
- The multi-layer perceptron layers have 2816 hidden dimensions and employ Gated GeLu activations
|
29 |
+
- The parameters of the input and classification layers are not shared
|
30 |
+
- 1.1B parameters
|
31 |
+
- used a unigram subword tokenizer trained on 10GB of text that consists of random subsets of OSCAR, OPUS, and Wikipedia
|
32 |
+
- Vocabulary size: 32000 tokens + 128 special tokens
|
33 |
+
|
34 |
### Model Description
|
35 |
|
36 |
<!-- Provide a longer summary of what this model is. -->
|
|
|
42 |
- **Language(s) (NLP):** Turkish
|
43 |
- **License:** The model is shared with the public to be used solely for non-commercial academic research purposes.
|
44 |
|
45 |
+
### Model Sources
|
46 |
|
47 |
<!-- Provide the basic links for the model. -->
|
48 |
|
|
|
63 |
|
64 |
<!-- This section is for the model use when fine-tuned for a task, or when plugged into a larger ecosystem/app -->
|
65 |
|
66 |
+
This model can be finetuned using [our library](https://github.com/boun-tabi-LMG/turkish-lm-tuner) to solve your custom task involving Turkish language.
|
67 |
|
68 |
+
This model can be further trained to behave more helpful, less harmful and better for dialog use cases.
|
69 |
|
70 |
### Out-of-Scope Use
|
71 |
|
|
|
94 |
|
95 |
## How to Get Started with the Model
|
96 |
|
97 |
+
You can find the technical guidance at our library's Github [page](https://github.com/boun-tabi-LMG/turkish-lm-tuner).
|
98 |
|
99 |
## Training Details
|
100 |
|
101 |
+
- The pretraining was performed with Mixture-of-Denoisers (MoD)
|
102 |
+
- This version of the model is trained for 1740000 steps
|
103 |
+
- Batch size: 48
|
104 |
+
- Input and output lengths: 512
|
105 |
+
- Effectively exposed to 42.7B tokens
|
106 |
+
|
107 |
Refer to the paper for more information.
|
108 |
|
109 |
+
|
110 |
## Evaluation
|
111 |
|
112 |
+
We didn't yet evaluate the model for biases in any way.
|
113 |
+
|
114 |
+
We only performed finetuning for several understanding and generation tasks:
|
115 |
+
|
116 |
+
- Paraphrasing: TAT and OST [source](https://aclanthology.org/2022.icnlsp-1.14.pdf)
|
117 |
+
- Summarization: [TRNews](https://dl.acm.org/doi/10.1007/s10579-021-09568-y) and [MLSUM](https://arxiv.org/pdf/2004.14900v1.pdf)
|
118 |
+
|
119 |
Refer to the paper for more information.
|
120 |
|
121 |
## Environmental Impact
|