sharif-dal
/

dal-bert

Inference Endpoints

Model card Files Files and versions Community

dal-bert / README.md

arm-on's picture

Update README.md

f841d28 about 2 years ago

|

history blame contribute delete

1.78 kB

	---
	license: apache-2.0
	language: fa
	widget:
	- text: "از هر دستی بگیری از همون [MASK] میدی"
	- text: "این آخرین باره بهت [MASK] میگم"
	- text: 'چرا آن جوان بیچاره را به سخره [MASK]'
	- text: 'آخه محسن [MASK] هم شد خواننده؟'
	- text: 'پسر عجب [MASK] زد'
	tags:
	- bert-fa
	- bert-persian
	model-index:
	- name: dal-bert
	results: []
	---


	DAL-BERT: Another pre-trained language model for Persian
	---

	DAL-BERT is a transformer-based model trained on more than 80 gigabytes of Persian text including both formal and informal (conversational) contexts. The architecture of this model follows the original BERT [[Devlin et al.](https://arxiv.org/abs/1810.04805)].

	How to use the Model
	---
	```python
	from transformers import BertForMaskedLM, BertTokenizer, pipeline
	model = BertForMaskedLM.from_pretrained('sharif-dal/dal-bert')
	tokenizer = BertTokenizer.from_pretrained('sharif-dal/dal-bert')
	fill_sentence = pipeline('fill-mask', model=model, tokenizer=tokenizer)
	fill_sentence('اینجا جمله مورد نظر خود را بنویسید و کلمه موردنظر را [MASK] کنید')
	```

	The Training Data
	---
	The abovementioned model was trained on a bunch of newspapers, news agencies' websites, technology-related sources, people's comments, magazines, literary criticism, and some blogs.

	Evaluation
	---

	\| Training Loss \| Epoch \| Step \|
	\|:-------------:\|:-----:\|:-----:\|
	\| 2.1855 \| 13 \| 7649486 \|

	Contributors
	---
	- Arman Malekzadeh [[Github](https://github.com/arm-on)]
	- Amirhossein Ramazani, Master's Student in AI @ Sharif University of Technology [[Linkedin](https://www.linkedin.com/in/amirhossein-ramazani/)] [[Github](https://github.com/amirhossein1376)]