How to run Megatron GPT2 using Transformers
Prerequisites
In that guide, we run all the commands from a folder called $MYDIR
and defined as (in bash
):
export MYDIR=$HOME
Feel free to change the location at your convenience.
To run some of the commands below, you'll have to clone Transformers
.
git clone https://github.com/huggingface/transformers.git $MYDIR/transformers
Get the checkpoints from the NVIDIA GPU Cloud
You must create a directory called nvidia/megatron-gpt2-345m
:
mkdir -p $MYDIR/nvidia/megatron-gpt2-345m
You can download the checkpoints from the NVIDIA GPU Cloud (NGC). For that you have to sign up for and setup the NVIDIA GPU Cloud (NGC) Registry CLI. Further documentation for downloading models can be found in the NGC documentation.
Alternatively, you can directly download the checkpoints using:
wget --content-disposition https://api.ngc.nvidia.com/v2/models/nvidia/megatron_lm_345m/versions/v0.0/zip -O $MYDIR/nvidia/megatron-gpt2-345m/checkpoint.zip
Converting the checkpoint
In order to be loaded into Transformers
, the checkpoint has to be converted. You should run the following command for that purpose.
That command will create config.json
and pytorch_model.bin
in $MYDIR/nvidia/megatron-gpt2-345m
.
You can move those files to different directories if needed.
python3 $MYDIR/transformers/src/transformers/models/megatron_gpt2/convert_megatron_gpt2_checkpoint.py $MYDIR/nvidia/megatron-gpt2-345m/checkpoint.zip
Text generation
The following code shows how to use the Megatron GPT2 checkpoint and the Transformers API to generate text.
import os
import torch
from transformers import GPT2Tokenizer, GPT2LMHeadModel
# The tokenizer. Megatron was trained with standard tokenizer(s).
tokenizer = GPT2Tokenizer.from_pretrained('gpt2')
# The path to the config/checkpoint (see the conversion step above).
directory = os.path.join(os.environ['MYDIR'], 'nvidia/megatron-gpt2-345m')
# Load the model from $MYDIR/nvidia/megatron-gpt2-345m.
model = GPT2LMHeadModel.from_pretrained(directory)
# Copy to the device and use FP16.
assert torch.cuda.is_available()
device = torch.device("cuda")
model.to(device)
model.eval()
model.half()
# Generate the sentence.
output = model.generate(input_ids=None, max_length=32, num_return_sequences=1)
# Output the text.
for sentence in output:
sentence = sentence.tolist()
text = tokenizer.decode(sentence, clean_up_tokenization_spaces=True)
print(text)
Original code
The original Megatron code can be found here: https://github.com/NVIDIA/Megatron-LM.