This is the trained model file for Ch2 - LLMs are MultiTask Learners.
This chapter creates a GPT2-124M from scratch for text generation. Please use the best_model.pt
checkpoint for inference.
Since, we have pre-trained on a small amount of data, the model has overfitted, but can still generate sensible text.
Plots
Loss (Train):
Loss (Val):
Perplexixty (Val):
Inference Providers
NEW
This model is not currently available via any of the supported Inference Providers.
The model cannot be deployed to the HF Inference API:
The model has no library tag.