File size: 1,123 Bytes

de906df
 
 
a861e5e
 
1789c36
 
 
 
 
e14811b
a7b55dc
394fff2
 
 
 
 
37d4558
 
394fff2
d83086d
 
 
 
 
 
 
b75ad20
d83086d

---
license: mit
---
🇹🇷 RoBERTaTurk

## Model description
This is a Turkish RoBERTa base model pretrained on Turkish Wikipedia, Turkish OSCAR, and some news websites.

The final training corpus has a size of 38 GB and 329.720.508 sentences.

Thanks to Turkcell we could train the model on Intel(R) Xeon(R) Gold 6230R CPU @ 2.10GHz 256GB RAM 2 x GV100GL [Tesla V100 PCIe 32GB] GPU for 2.5M steps.

# Usage
Load transformers library with:
```
from transformers import AutoTokenizer, AutoModelForMaskedLM
  
tokenizer = AutoTokenizer.from_pretrained("burakaytan/roberta-base-turkish-uncased")
model = AutoModelForMaskedLM.from_pretrained("burakaytan/roberta-base-turkish-uncased")
```


## Citation and Related Information

To cite this model:
```bibtex

@INPROCEEDINGS{999,
  author={Aytan, Burak and Sakar, C. Okan},
  booktitle={2022 30th Signal Processing and Communications Applications Conference (SIU)}, 
  title={Comparison of Transformer-Based Models Trained in Turkish and Different Languages on Turkish Natural Language Processing Problems}, 
  year={2022},
  volume={},
  number={},
  pages={},
  doi={}}
```