minicpm-nanotron / README.md
thomwolf's picture
thomwolf HF staff
update
54ba632
metadata
library_name: nanotron

βš™οΈ Nano-Mistral

Modeling code for Mistral to use with Nanotron

Also contains converted pretrained weights for Mistral-7B-0.1: https://huggingface.co/mistralai/Mistral-7B-v0.1

πŸš€ Quickstart

# Generate a config file
python config_tiny_mistral.py

# Run training
export CUDA_DEVICE_MAX_CONNECTIONS=1 # important for some distributed operations
torchrun --nproc_per_node=8 run_train.py --config-file config_tiny_mistral.yaml

πŸš€ Run generation with pretrained Mistral-7B-0.1

export CUDA_DEVICE_MAX_CONNECTIONS=1
torchrun --nproc_per_node=1 run_generate.py --ckpt-path ./pretrained/Mistral-7B-v0.1

πŸš€ Use your custom model

  • Update the MistralConfig class in config_tiny_mistral.py to match your model's configuration
  • Update the MistralForTraining class in modeling_mistral.py to match your model's architecture
  • Pass the previous to the DistributedTrainer class in run_train.py:
trainer = DistributedTrainer(config_file, model_class=MistralForTraining, model_config_class=MistralConfig)
  • Run training as usual