gokuls
/

BERT_pretraining_h_100_wo_deepspeed

Generated from Trainer

Inference Endpoints

Model card Files Files and versions Metrics Training metrics Community

BERT_pretraining_h_100_wo_deepspeed / README.md

gokuls's picture

End of training

16b2f29 verified 11 months ago

|

history blame contribute delete

3.08 kB

	---
	license: apache-2.0
	base_model: bert-large-uncased
	tags:
	- generated_from_trainer
	datasets:
	- gokuls/wiki_book_corpus_complete_processed_bert_dataset
	metrics:
	- accuracy
	model-index:
	- name: BERT_pretraining_h_100_wo_deepspeed
	results:
	- task:
	name: Masked Language Modeling
	type: fill-mask
	dataset:
	name: gokuls/wiki_book_corpus_complete_processed_bert_dataset
	type: gokuls/wiki_book_corpus_complete_processed_bert_dataset
	metrics:
	- name: Accuracy
	type: accuracy
	value: 0.15387755648267093
	---

	<!-- This model card has been generated automatically according to the information the Trainer had access to. You
	should probably proofread and complete it, then remove this comment. -->

	# BERT_pretraining_h_100_wo_deepspeed

	This model is a fine-tuned version of [bert-large-uncased](https://huggingface.co/bert-large-uncased) on the gokuls/wiki_book_corpus_complete_processed_bert_dataset dataset.
	It achieves the following results on the evaluation set:
	- Loss: 5.7778
	- Accuracy: 0.1539

	## Model description

	More information needed

	## Intended uses & limitations

	More information needed

	## Training and evaluation data

	More information needed

	## Training procedure

	### Training hyperparameters

	The following hyperparameters were used during training:
	- learning_rate: 1e-05
	- train_batch_size: 208
	- eval_batch_size: 208
	- seed: 10
	- optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
	- lr_scheduler_type: linear
	- lr_scheduler_warmup_steps: 100000
	- num_epochs: 100

	### Training results

	\| Training Loss \| Epoch \| Step \| Validation Loss \| Accuracy \|
	\|:-------------:\|:-----:\|:------:\|:---------------:\|:--------:\|
	\| 6.8769 \| 0.36 \| 10000 \| 6.7582 \| 0.1101 \|
	\| 6.4647 \| 0.71 \| 20000 \| 6.4764 \| 0.1314 \|
	\| 6.3679 \| 1.07 \| 30000 \| 6.3218 \| 0.1407 \|
	\| 6.252 \| 1.42 \| 40000 \| 6.2139 \| 0.1454 \|
	\| 6.2132 \| 1.78 \| 50000 \| 6.1398 \| 0.1478 \|
	\| 6.0407 \| 2.13 \| 60000 \| 6.0774 \| 0.1502 \|
	\| 6.0694 \| 2.49 \| 70000 \| 6.0303 \| 0.1516 \|
	\| 5.9996 \| 2.84 \| 80000 \| 5.9893 \| 0.1521 \|
	\| 5.9166 \| 3.2 \| 90000 \| 5.9553 \| 0.1526 \|
	\| 5.8915 \| 3.55 \| 100000 \| 5.9261 \| 0.1530 \|
	\| 5.8924 \| 3.91 \| 110000 \| 5.8996 \| 0.1534 \|
	\| 5.8972 \| 4.26 \| 120000 \| 5.8814 \| 0.1533 \|
	\| 5.8454 \| 4.62 \| 130000 \| 5.8626 \| 0.1532 \|
	\| 5.8104 \| 4.97 \| 140000 \| 5.8494 \| 0.1534 \|
	\| 5.8461 \| 5.33 \| 150000 \| 5.8378 \| 0.1534 \|
	\| 5.8476 \| 5.68 \| 160000 \| 5.8246 \| 0.1536 \|
	\| 5.7255 \| 6.04 \| 170000 \| 5.8155 \| 0.1532 \|
	\| 5.8431 \| 6.39 \| 180000 \| 5.8068 \| 0.1537 \|
	\| 5.7526 \| 6.75 \| 190000 \| 5.7981 \| 0.1537 \|
	\| 5.7826 \| 7.1 \| 200000 \| 5.7886 \| 0.1537 \|


	### Framework versions

	- Transformers 4.37.1
	- Pytorch 2.1.2+cu121
	- Datasets 2.16.1
	- Tokenizers 0.15.1