Create README.md
Browse files
README.md
ADDED
@@ -0,0 +1,76 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
1 |
+
---
|
2 |
+
language:
|
3 |
+
- tr
|
4 |
+
arxiv: 2403.01308
|
5 |
+
library_name: transformers
|
6 |
+
pipeline_tag: text2text-generation
|
7 |
+
---
|
8 |
+
# VBART Model Card
|
9 |
+
|
10 |
+
## Model Description
|
11 |
+
|
12 |
+
VBART is the first sequence-to-sequence model trained in Turkish corpora from scratch. It is trained by VNGRS and its training ended in February 2023.
|
13 |
+
Model is capable of text transformation task such as summarization, paraphrasing, title generation with fine-tuning.
|
14 |
+
|
15 |
+
This model is scores better on many tasks, albeit being much smaller than other implementations.
|
16 |
+
|
17 |
+
This repository contains fine-tuned weights of VBART for paraphrasing task.
|
18 |
+
|
19 |
+
- **Developed by:** [VNGRS](https://vngrs.com/)
|
20 |
+
- **Model type:** Transformer encoder-decoder based on mBart
|
21 |
+
- **Language(s) (NLP):** Turkish
|
22 |
+
- **License:** [More Information Needed]
|
23 |
+
- **Finetuned from model:** VBART-Large
|
24 |
+
- **Paper:** [arxiv](https://arxiv.org/abs/2403.01308)
|
25 |
+
## How to Get Started with the Model
|
26 |
+
Use the code below to get started with the model.
|
27 |
+
-> Model yüklendikten sonra bir kod çıkar
|
28 |
+
[More Information Needed]
|
29 |
+
|
30 |
+
## Training Details
|
31 |
+
|
32 |
+
### Training Data
|
33 |
+
Base model training data is filtered mixed corpus made of Turkish parts of [OSCAR-2201](https://huggingface.co/datasets/oscar-corpus/OSCAR-2201) and [mC4](https://huggingface.co/datasets/mc4) datasets. These datasets consist of documents of unstructured web crawl data. More information about the dataset can be found in their respective page. Data then filtered using set of heuristics and certain rules, explained in appendix of our [paper](https://arxiv.org/abs/2403.01308).
|
34 |
+
|
35 |
+
Fine-tuning dataset is TODO
|
36 |
+
|
37 |
+
### Limitations
|
38 |
+
This model in fine-tuned to paraphrasing task. It is not intended to be used in any other case and can not be fine-tuned to any other task with full performance of the base model.
|
39 |
+
|
40 |
+
### Training Procedure
|
41 |
+
Pretrained for 30 days and with total of 708B tokens.
|
42 |
+
#### Hardware
|
43 |
+
- **GPUs**: 8X Nvidia A100-80 GB
|
44 |
+
#### Software
|
45 |
+
- Tensorflow
|
46 |
+
#### Hyperparameters
|
47 |
+
##### Pretraining
|
48 |
+
- **Training regime:** fp16 mixed precision
|
49 |
+
- **Training objective** : Sentence permutation and span masking (using mask lengths sampled from Poisson distribution λ=3.5 and total of %30 data)
|
50 |
+
- **Optimizer** : Adam optimizer (β1 = 0.9, β2 = 0.98, Ɛ = 1e-6)
|
51 |
+
- **Scheduler**: Linear decay scheduler (20,000 warm up steps)
|
52 |
+
- **Dropout**: 0.1 (dropped to 0.05 and 0 in last 160k steps)
|
53 |
+
- **Learning rate**: 5e-6
|
54 |
+
- **Training Amount**: 708B
|
55 |
+
|
56 |
+
##### Fine-tuning
|
57 |
+
- **Training regime:** fp16 mixed precision
|
58 |
+
- **Optimizer** : Adam optimizer (β1 = 0.9, β2 = 0.98, Ɛ = 1e-6)
|
59 |
+
- **Scheduler**: Linear decay scheduler
|
60 |
+
- **Dropout**: 0.1
|
61 |
+
- **Learning rate**: 5e-5
|
62 |
+
#### Metrics
|
63 |
+

|
64 |
+
|
65 |
+
## License
|
66 |
+
|
67 |
+
|
68 |
+
## Citation
|
69 |
+
```
|
70 |
+
@article{turker2024vbart,
|
71 |
+
title={VBART: The Turkish LLM},
|
72 |
+
author={Turker, Meliksah and Ari, Erdi and Han, Aydin},
|
73 |
+
journal={arXiv preprint arXiv:2403.01308},
|
74 |
+
year={2024}
|
75 |
+
}
|
76 |
+
```
|