nixgd commited on
Commit
97b7275
1 Parent(s): 85c27af

Correct maximum positional embeddings

Browse files

The model appears to have been trained with context window = 512, not 2048 as claimed here. This can be seen by looking at the average loss by sequence position on the GPT4 tiny stories dataset (packed into inputs of length 2048):

![image.png](https://cdn-uploads.huggingface.co/production/uploads/65b0cb8770773c0ab8fde1e0/qXnk9-RtXGrXlUlkZCxl3.png)

It would be great to get this changed (for all tinystories models), as the current config is misleading.

Files changed (1) hide show
  1. config.json +1 -1
config.json CHANGED
@@ -28,7 +28,7 @@
28
  "initializer_range": 0.02,
29
  "intermediate_size": null,
30
  "layer_norm_epsilon": 1e-05,
31
- "max_position_embeddings": 2048,
32
  "model_type": "gpt_neo",
33
  "num_heads": 16,
34
  "num_layers": 4,
 
28
  "initializer_range": 0.02,
29
  "intermediate_size": null,
30
  "layer_norm_epsilon": 1e-05,
31
+ "max_position_embeddings": 512,
32
  "model_type": "gpt_neo",
33
  "num_heads": 16,
34
  "num_layers": 4,