UPDATE (2023-09-23):
This model is obsolete. Thanks to quantization you can run AI Dungeon 2 Classic (a 1.5B model) under equivalent hardware. See here.
AID-Neo-125M
Model description
This model was inspired by -- and finetuned on the same dataset of -- KoboldAI's GPT-Neo-125M-AID (Mia) model: the AI Dungeon dataset (text_adventures.txt
). This was to fix a possible oversight in the original model, which was trained with an unfortunate bug. You could technically consider it a "retraining" of the same model using different software.
Training hyperparameters
The following hyperparameters were used during training:
- learning_rate: 5e-05
- train_batch_size: 1
- eval_batch_size: 1
- seed: 42
- optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
- lr_scheduler_type: linear
- num_epochs: 3.0
- Downloads last month
- 241
This model does not have enough activity to be deployed to Inference API (serverless) yet. Increase its social
visibility and check back later, or deploy to Inference Endpoints (dedicated)
instead.