Model description

Based off facebook/opt-30b model, finetuned on chucked Dalio responses

Dataset Used

Jellywibble/dalio-pretrain-book-dataset-v2

Training Parameters

  • Deepspeed on 4xA40 GPUs
  • Ensuring EOS token <s> appears only at the beginning of each chunk
  • Gradient Accumulation steps = 1 (Effective batch size of 4)
  • 3e-6 Learning Rate, AdamW optimizer
  • Block size of 800
  • Trained for 1 Epoch (additional epochs yielded worse Hellaswag result)

Metrics

Downloads last month
17
Inference Examples
This model does not have enough activity to be deployed to Inference API (serverless) yet. Increase its social visibility and check back later, or deploy to Inference Endpoints (dedicated) instead.