Jellywibble
/

dalio-principles-pretrain-v2

Text Generation

text-generation-inference

Inference Endpoints

Model card Files Files and versions Community

Model description

Based off facebook/opt-30b model, finetuned on chucked Dalio responses

Dataset Used

Jellywibble/dalio-pretrain-book-dataset-v2

Training Parameters

Deepspeed on 4xA40 GPUs
Ensuring EOS token <s> appears only at the beginning of each chunk
Gradient Accumulation steps = 1 (Effective batch size of 4)
3e-6 Learning Rate, AdamW optimizer
Block size of 800
Trained for 1 Epoch (additional epochs yielded worse Hellaswag result)

Metrics

Hellaswag Perplexity: 30.2
Eval accuracy: 49.8%
Eval loss: 2.283
Checkpoint 16 uploaded
wandb run: https://wandb.ai/jellywibble/huggingface/runs/2vtr39rk?workspace=user-jellywibble

Downloads last month: 17

Inference Examples

Text Generation

This model does not have enough activity to be deployed to Inference API (serverless) yet. Increase its social visibility and check back later, or deploy to Inference Endpoints (dedicated) instead.