mehdie
/

ancient_semitic_bert

Generated from Trainer

Inference Endpoints

Model card Files Files and versions Community

ancient_semitic_bert / README.md

morten-j's picture

Update README.md

62a73e4 verified 7 months ago

|

history blame contribute delete

3.58 kB

	---
	tags:
	- generated_from_trainer
	model-index:
	- name: ancient_semitic_bert
	results: []
	datasets:
	- bigscience-data/roots_ar_openiti_proc
	- mehdie/sefaria
	language:
	- ar
	- he
	---

	<!-- This model card has been generated automatically according to the information the Trainer had access to. You
	should probably proofread and complete it, then remove this comment. -->

	# ancient_semitic_bert

	This model is a fine-tuned version of [](https://huggingface.co/) on the None dataset.
	It achieves the following results on the evaluation set:
	- Loss: 1.9118
	- Perplexity: 6.77 (40 Epochs)
	## Model description

	More information needed

	## Intended uses & limitations

	More information needed

	## Training and evaluation data

	More information needed

	## Training procedure

	### Training hyperparameters

	The following hyperparameters were used during training:
	- learning_rate: 5e-05
	- train_batch_size: 16
	- eval_batch_size: 16
	- seed: 42
	- distributed_type: multi-GPU
	- num_devices: 4
	- total_train_batch_size: 64
	- total_eval_batch_size: 64
	- optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
	- lr_scheduler_type: linear
	- lr_scheduler_warmup_steps: 10000
	- num_epochs: 40.0

	### Training results

	\| Training Loss \| Epoch \| Step \| Validation Loss \|
	\|:-------------:\|:-----:\|:-------:\|:---------------:\|
	\| 6.778 \| 1.0 \| 55319 \| 6.4618 \|
	\| 6.4271 \| 2.0 \| 110638 \| 6.3701 \|
	\| 6.3616 \| 3.0 \| 165957 \| 6.3217 \|
	\| 6.3257 \| 4.0 \| 221276 \| 6.2966 \|
	\| 6.3001 \| 5.0 \| 276595 \| 6.2759 \|
	\| 6.2834 \| 6.0 \| 331914 \| 6.2610 \|
	\| 6.2699 \| 7.0 \| 387233 \| 6.2465 \|
	\| 6.2565 \| 8.0 \| 442552 \| 6.1939 \|
	\| 6.2221 \| 9.0 \| 497871 \| 6.1154 \|
	\| 6.0721 \| 10.0 \| 553190 \| 5.9524 \|
	\| 5.9212 \| 11.0 \| 608509 \| 5.7947 \|
	\| 5.8113 \| 12.0 \| 663828 \| 5.7161 \|
	\| 5.7509 \| 13.0 \| 719147 \| 5.6614 \|
	\| 5.7053 \| 14.0 \| 774466 \| 5.6158 \|
	\| 5.6665 \| 15.0 \| 829785 \| 5.5774 \|
	\| 5.634 \| 16.0 \| 885104 \| 5.5448 \|
	\| 5.6055 \| 17.0 \| 940423 \| 2.7563 \|
	\| 3.3308 \| 18.0 \| 995742 \| 2.5443 \|
	\| 2.6179 \| 19.0 \| 1051061 \| 2.4196 \|
	\| 2.5324 \| 20.0 \| 1106380 \| 2.3393 \|
	\| 2.4791 \| 21.0 \| 1161699 \| 2.2755 \|
	\| 2.4105 \| 22.0 \| 1217018 \| 2.2241 \|
	\| 2.3582 \| 23.0 \| 1272337 \| 2.1772 \|
	\| 2.3281 \| 24.0 \| 1327656 \| 2.1416 \|
	\| 2.2987 \| 25.0 \| 1382975 \| 2.1137 \|
	\| 2.7859 \| 26.0 \| 1438294 \| 2.0950 \|
	\| 2.2728 \| 27.0 \| 1493613 \| 2.0685 \|
	\| 2.2308 \| 28.0 \| 1548932 \| 2.0499 \|
	\| 2.1739 \| 29.0 \| 1604251 \| 2.0082 \|
	\| 2.1569 \| 30.0 \| 1659570 \| 1.9939 \|
	\| 2.1425 \| 31.0 \| 1714889 \| 1.9802 \|
	\| 2.1318 \| 32.0 \| 1770208 \| 1.9669 \|
	\| 2.1207 \| 33.0 \| 1825527 \| 1.9583 \|
	\| 2.1111 \| 34.0 \| 1880846 \| 1.9477 \|
	\| 2.102 \| 35.0 \| 1936165 \| 1.9409 \|
	\| 2.0943 \| 36.0 \| 1991484 \| 1.9313 \|
	\| 2.0871 \| 37.0 \| 2046803 \| 1.9236 \|
	\| 2.0736 \| 38.0 \| 2102122 \| 1.9191 \|
	\| 2.0693 \| 39.0 \| 2157441 \| 1.9147 \|
	\| 2.0653 \| 40.0 \| 2212760 \| 1.9118 \|


	### Framework versions

	- Transformers 4.38.2
	- Pytorch 2.3.0a0+ebedce2
	- Datasets 2.17.1
	- Tokenizers 0.15.2