HeshamElsherif685's picture
Upload tokenizer
4698701 verified
|
raw
history blame
549 Bytes
metadata
{}

CodeParrot

This is a small version of the CodeParrot tokenizer trained on the CodeParrot Python code dataset. The tokenizer is trained in Chapter 10: Training Transformers from Scratch in the NLP with Transformers book. You can find the full code in the accompanying Github repository.