Instruction Tuned GPT-NeoXT-20B model on Instruction Tuning dataset as listed below (~5.2M data) using Colossal AI

Base Model: togethercomputer/GPT-NeoXT-Chat-Base-20B (GPT-NeoXT-Chat-Base-20B-v0.16 - fine-tuned on feedback data)

Training Details :

  • Epochs: 4
  • Batch Size : 5 instantaneous per device x 1 gradient accumulation steps x 8 gpus = 40
  • Block Size : 2020
  • Weight Decay : 0
  • Learning Rate : 1e-6
  • Learning Rate Scheduler Type : Cosine
  • Number of warmup steps : 600
  • Machine : 8xA100 80GB

Training Data Specifics :

  • Labels and Input ids are exactly the same.
  • Block Size is 2020, Multiple instructions are clubbed together in each data.
  • "###" is the EOS Token used in the data.
Downloads last month
8
Inference Examples
This model does not have enough activity to be deployed to Inference API (serverless) yet. Increase its social visibility and check back later, or deploy to Inference Endpoints (dedicated) instead.

Datasets used to train manojpreveen/gpt-neoxt-20b-v12