Smaller variant
#3
by
viktor-ferenczi
- opened
The original blog post says:
Furthermore, in a space-constrained environment, the 70k unused embeddings (corresponding to reserved tokens) could be removed from the input/output embedding matrices. This would reduce the model size by approximately 570M parameters.
I suggest having such a reduced version as well available on HF.
I wish, but not sure the checkpoints were released