kanishka
/

opt-babylm2-rewritten-clean-spacy-earlystop-bpe_seed-42_1e-3

Text Generation

Generated from Trainer

text-generation-inference

Inference Endpoints

Model card Files Files and versions Community

opt-babylm2-rewritten-clean-spacy-earlystop-bpe_seed-42_1e-3

This model was trained from scratch on the kanishka/babylm2-rewritten-clean-spacy dataset. It achieves the following results on the evaluation set:

Loss: 2.6880
Accuracy: 0.4787

Model description

More information needed

Intended uses & limitations

More information needed

Training and evaluation data

More information needed

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

learning_rate: 0.001
train_batch_size: 32
eval_batch_size: 64
seed: 42
gradient_accumulation_steps: 8
total_train_batch_size: 256
optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
lr_scheduler_type: linear
lr_scheduler_warmup_steps: 32000
num_epochs: 20.0
mixed_precision_training: Native AMP

Training results

Training Loss	Epoch	Step	Validation Loss	Accuracy
4.0959	0.9999	2257	3.8126	0.3613
3.4463	1.9999	4514	3.2972	0.4099
3.1228	2.9998	6771	3.0851	0.4315
2.9166	3.9998	9028	2.9807	0.4418
2.8402	4.9997	11285	2.9249	0.4476
2.7832	5.9997	13542	2.8851	0.4521
2.7377	6.9996	15799	2.8602	0.4546
2.7101	8.0	18057	2.8389	0.4572
2.684	8.9999	20314	2.8260	0.4586
2.6654	9.9999	22571	2.8155	0.4596
2.6466	10.9998	24828	2.8077	0.4604
2.6474	11.9998	27085	2.8025	0.4615
2.6366	12.9997	29342	2.7983	0.4619
2.625	13.9997	31599	2.7928	0.4626
2.6109	14.9996	33856	2.7690	0.4654
2.5658	16.0	36114	2.7445	0.4686
2.5185	16.9999	38371	2.7228	0.4717
2.4637	17.9999	40628	2.7043	0.4747
2.3969	18.9998	42885	2.6895	0.4774
2.3245	19.9989	45140	2.6880	0.4787

Framework versions

Transformers 4.45.1
Pytorch 2.4.1+cu121
Datasets 3.0.1
Tokenizers 0.20.0

Downloads last month: 17

Safetensors

Model size

98.1M params

Tensor type

F32

·

Inference Examples

Text Generation

This model does not have enough activity to be deployed to Inference API (serverless) yet. Increase its social visibility and check back later, or deploy to Inference Endpoints (dedicated) instead.

Dataset used to train kanishka/opt-babylm2-rewritten-clean-spacy-earlystop-bpe_seed-42_1e-3

Evaluation results

Accuracy on kanishka/babylm2-rewritten-clean-spacy
self-reported

0.479

View on Papers With Code