metadata

license: other
language:
  - tr
library_name: transformers
pipeline_tag: text2text-generation
inference: false

Model Card for TURNA

TURNA is a Turkish language model based on the UL2 framework which is suitable for both understanding and generation tasks.

Evaluations across three generation and five understanding tasks in Turkish show that TURNA outperforms several multilingual models and competes with monolingual Turkish models in understanding tasks.

The model is shared with the public to be used solely for non-commercial academic research purposes.

Model Details

36 encoder and decoder layers
16 attention heads
Token embeddings are 1024 dimensional
The multi-layer perceptron layers have 2816 hidden dimensions and employ Gated GeLu activations
The parameters of the input and classification layers are not shared
1.1B parameters
used a unigram subword tokenizer trained on 10GB of text that consists of random subsets of OSCAR, OPUS, and Wikipedia
Vocabulary size: 32000 tokens + 128 special tokens

Model Description

Developed by: Bogazici University Computer Engineering Department TABILAB (special thanks to VNGRS-AI for sharing their tokenizer)
Funded by: We thank the Google TPU Research Cloud program for providing us with credits to pretrain our model on TPU v3-8 machines. We are grateful to TETAM and BOUN CMPE for providing access to the GPU cluster used in fine-tuning and evaluation experiments.
Model type: Transformer-based encoder-decoder
Language(s) (NLP): Turkish
License: The model is shared with the public to be used solely for non-commercial academic research purposes.

Model Sources

Repository: Training code, Finetuning library
Paper: Arxiv paper

Uses

Direct Use

This model can be used for research purposes. You give some text and this model tries to predict the next words.

Downstream Use

This model can be finetuned using our library to solve your custom task involving Turkish language.

This model can be further trained to behave more helpful, less harmful and better for dialog use cases.

Out-of-Scope Use

Any commercial or malicious activity.

Bias, Risks, and Limitations

We refer to the Flan-T5's official model card:

Language models, including Flan-T5, can potentially be used for language generation in a harmful way, according to Rae et al. (2021). Flan-T5 should not be used directly in any application, without a prior assessment of safety and fairness concerns specific to the application.

Ethical considerations and risks

... (ed. The model) is fine-tuned on a large corpus of text data that was not filtered for explicit content or assessed for existing biases. As a result the model itself is potentially vulnerable to generating equivalently inappropriate content or replicating inherent biases in the underlying data.

Known Limitations

... (ed. The model) has not been tested in real world applications.

Sensitive Use:

... (ed. The model) should not be applied for any unacceptable use cases, e.g., generation of abusive speech.

How to Get Started with the Model

You can find the technical guidance at our library's Github page.

Training Details

The pretraining was performed with Mixture-of-Denoisers (MoD)
This version of the model is trained for 1740000 steps
Batch size: 48
Input and output lengths: 512
Effectively exposed to 42.7B tokens

Refer to the paper for more information.

Evaluation

We didn't yet evaluate the model for biases in any way.

However, we performed finetuning for several understanding and generation tasks:

Paraphrasing: TAT and OST (source)
Summarization and news title generation: TRNews and MLSUM
Named Entity Recognition: WikiANN and MilliyetNER
Part of Speech tagging: Two Universal Dependencies Turkish Treebanks, IMST, BOUN.
Semantic Textual Similarity: STSb-tr
Natural language inference: NLI-TR
Text classification: Product reviews, TTC4900, and Tweet sentiments

Refer to the paper for more information.

Environmental Impact

Carbon emissions can be estimated using the Machine Learning Impact calculator presented in Lacoste et al. (2019).

Hardware Type: TPU v3-8
Hours used: About 400 hours
Cloud Provider: Google Cloud
Compute Region: europe-west4-a
Carbon Emitted: 64.52 kg CO2_2

Technical Specifications

Refer to the paper for more information.

Citation

BibTeX:

Coming soon!

APA:

Coming soon!

Model Card Authors

Paper authors.

Model Card Contact

Onur Güngör