Edit model card

Basemodel: GPT-Neo

Configs: Vocab size: 10,000 Hidden size: 512 Max position embeddings: 512 Number of layers: 2 Number of heads: 4 Window size: 256 Intermediate-size: 1024

Results:

  • Task: glue Score: 58.36 Confidence Interval: [57.95, 58.78]
  • Task: blimp Score: 55.64 Confidence Interval: [54.68, 56.64]
Downloads last month
14
Inference Examples
This model does not have enough activity to be deployed to Inference API (serverless) yet. Increase its social visibility and check back later, or deploy to Inference Endpoints (dedicated) instead.

Dataset used to train AISE-TUDelft/Custom-Activations-GPT-Swish

Collection including AISE-TUDelft/Custom-Activations-GPT-Swish