Difference between distilled and the original?
#1
by
ghosthamlet
- opened
Thanks for this great model.
The original model: https://huggingface.co/facebook/nllb-200-1.3B has a same size file pytorch_model.bin as this distilled version,
then what is the difference between these two model?
As I understand it (from the paper) this is a 1.3B parameters model distilled from the full 54B NLLB-200 model. it gives better results then 1.3 B dense (Table 41 in the paper).
Thanks for the reply.
ghosthamlet
changed discussion status to
closed