retnet-mini-shakespeare

This model was trained from scratch on "tinyshakespeare" text file.

Model description

A tiny model similar to jploski/falcon-mini-shakespeare, to demonstrate training and recurrent inference using a retentive network (https://arxiv.org/pdf/2307.08621.pdf). The code utilizes Sehyun Choi's implementation of retentive network (https://github.com/syncdoth/RetNet) with configuration parameters changed to make it a very tiny model.

  • License: Apache 2.0.

Intended uses & limitations

Intended to demonstrate training and (recurrent O(1)) inference using a retentive network

Training and evaluation data

https://raw.githubusercontent.com/karpathy/char-rnn/master/data/tinyshakespeare/input.txt

Training procedure

Note: updated on 2023-11-10 to work with the current version of syncdoth/RetNet.

Just used the single tinyshakespeare text file as both the training and validation set (split up into paragraphs). See:

https://colab.research.google.com/drive/1wZnM7FCe4TsQpoamJ7NDAuQfA3DYiwHi?usp=sharing

Training hyperparameters

The following hyperparameters were used during training:

  • learning_rate: 0.0006
  • train_batch_size: 32
  • eval_batch_size: 32
  • seed: 42
  • gradient_accumulation_steps: 8
  • total_train_batch_size: 256
  • optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
  • lr_scheduler_type: cosine
  • lr_scheduler_warmup_steps: 10
  • num_epochs: 40

Training results

Training Loss Epoch Step Validation Loss
3.6853 10.0 370 3.4459
2.1973 20.0 740 2.0213
1.3819 30.0 1110 1.3017
1.1658 40.0 1480 1.1566

Framework versions

  • Transformers 4.35.0
  • Pytorch 2.1.0+cu118
  • Datasets 2.14.6
  • Tokenizers 0.14.1
Downloads last month
24
Safetensors
Model size
18.5M params
Tensor type
F32
·
Inference Examples
This model does not have enough activity to be deployed to Inference API (serverless) yet. Increase its social visibility and check back later, or deploy to Inference Endpoints (dedicated) instead.