|
--- |
|
license: apache-2.0 |
|
--- |
|
|
|
**Don't use this model for any applied task. It too small to be practically useful. It is just a part of a weird research project.** |
|
|
|
An extremely small version of T5 with these parameters |
|
|
|
```python |
|
"d_ff": 1024, |
|
"d_kv": 64, |
|
"d_model": 256, |
|
"num_heads": 4, |
|
"num_layers": 1, # yes, just one layer |
|
``` |
|
|
|
The model was pre-trained on `realnewslike` subset of C4 for 1 epoch with sequence length `64`. Corresponding WandB run: [click](https://wandb.ai/guitaricet/t5-lm/runs/2yvuxsfz?workspace=user-guitaricet). |