How to download and use the model.
Hello, does anyone have a snippet of python code on how to download and use the model? OR anything that shows you the procedures to use the model.
Hello, you can download the model with git LFS and then run it using the inference script in the github repo.
- Accept Llama2 license and download Llama2 weights
- Download the amharic finetune from this repository as shown here https://huggingface.co/docs/hub/models-downloading
- Clone the github repo and put your path to llama2 and the peft model into the inference script here: https://github.com/iocuydi/amharic-llama-llava/blob/main/inference/run_inf.py
What is the peft model?
This line doesn't seem to import inside the run_inf.py file:
from model_utils import load_model, load_peft_model
I can't find the model_utils file anywhere in the github repo
Added that file to the github repo.
Peft stands for "Parameter Efficient Fine Tuning." It allows large models to be finetuned more easily, more about it here: https://huggingface.co/blog/peft
With this and most llama finetunes, you'll load the original llama weights, and then a smaller set of Peft weights from the finetune.
Thank you for doing that. So I did the following as you described:
- Downloaded the llama-2-7b model using the download.sh script
- Downloaded this amharic model using git lfs from hugging face
- Cloned the github repository and put the path to the llama model in the run_inf.py file
Questions:
- Where do I use the amharic model I downloaded from here (step 2 above)
- What is the below path exactly
peft_model = '/path/to/checkpoint' - How do I change the Llama-2 tokenizer with the Llama-2-Amharic tokenizer.
Thank you.
Forgot to mention you need to convert llama2 to huggingface format as with this: https://github.com/huggingface/transformers/blob/main/src/transformers/models/llama/convert_llama_weights_to_hf.py
- The "main_path" param should point at the directory with the llama weights after they are converted to huggingface format.
- The peft model path is the path to the finetuned checkpoint. Without it loading a checkpoint, you're just using the original Llama2. This path should point to a directory containing the files downloaded from this hf repository (the fine tuned weights).
- Replace the tokenizer files that come with Llama2 with the tokenizer files from this repository.
Thank you!! Regarding the tokenizer files, would replacing only the tokenizer.model file work? I tried that and it does respond with Amharic. Though not sure if replacing the remaining files improve its output.
You should replace all the applicable tokenizer files with ours. A couple other tips for prompting:
-Try different system prompts (the initial instruction about being an Amharic assistant) but keep the system prompt in English
-Experiment with different hyper parameters depending on the task, higher top k/temperature can give more varied and creative answers but also more chance of hallucinations and wrong answers.
Thanks for the tips.
I was thinking of continuing the pre-training with more amharic data. Unfortunately, I wasn't really able to find good resources on how to do that. Can you please recommend some helpful resources to achieve that?
The scripts in the github repo can be used for pretraining and finetuning. Unless you have a massive amount of Amharic data (billions of tokens), doing additional pretraining likely will not help much, and finetuning would be a more effective strategy. You can also check out the Chinese Llama Alpaca paper/repo for more details, much of this work was based on that.
Alright, thanks a lot for your support!!
One more thing. So I tried to finetune the model on top of loading the gari model using peft. Then, when I try to run inference by loading both the gari peft and my finetuned peft one after another and try to ask a question, it no longer gives an answer it previously replied correctly. Like if I ask "what medicine should I take if I have a flu" it answers well on the gari peft, but outputs giberrish on the one that loads both the gari peft and the newer finetuned peft.
MAIN_PATH = '/model/Llama-2-7b-hf'
peft_model = '/model/llama-2-amharic-3784m'
#newer finetuned version on top of the garri model
peft_model2 = '/home/user/model/output'
model = load_model(model_name, quantization)
model = load_peft_model(model, peft_model)
model = load_peft_model(model, peft_model2)
Is the way I'm loading both peft models correct?
Only load one peft model. If you load another you're replacing the weights of the first one, they aren't meant to be mixed. In general you will load a single base llama model, and optionally a single peft model.
For your case, it sounds like you should follow these steps:
- load Llama2 with my peft model, then finetune
- After training, load Llama2 with your peft model, perform inference, additional finetuning, etc.
If your model isn't performing as expected, there may be an issue with your dataset or training process. One way to debug is to first try a very simple dataset of a couple thousand identical items (all the same training example) and see if you can get the model to overfit and get 0 loss on this and inference properly, before moving on to the actual dataset.
anyone who can explain the steps one by one in detail with the file structure of all folders and files? I have completed step 1 and 2(which are Accept Llama2 license and download Llama2 weights and Download the amharic finetune from this repository as shown here https://huggingface.co/docs/hub/models-downloading). but for the 3rd one, when I try to convert llama to hf file, the tokenizer.model was not available in llama-2-7b . but it was available in llama folder. I tries to replicate that file to llama-2-7b, nothing works.
Not sure if lots have changed, but this method worked for me back in january:-
1.Accept Llama2 license on huggingface and download it like this:
git lfs install
git clone https://huggingface.co/meta-llama/Llama-2-7b-hf
2. Download the amharic finetune from huggingface like this:
git lfs install
git clone https://huggingface.co/iocuydi/llama-2-amharic-3784m
3. Clone this github repository: https://github.com/iocuydi/amharic-llama-llava
4. Then inside inference/run_inf.py:
comment the import safety_utils line
change the MAIN_PATH to the path to folder you downloaded from step 1
change the peft_model to the path you cloned in the step 2
Go to your llama2 folder(from step 1) and replace the tokenizer.model file with the one you find from the 2nd step
set quanitzation=True inside the main function before the load_model function call
5. Finally run the inference/run_inf.py file
@iocuydi i was little bit confused to follow the discussion that why I have asked about the file structure. but now thanks to @abdimussa87 it is clear. one more question, is the size of https://huggingface.co/meta-llama/Llama-2-7b-hf more that 14 GIB? I was trying to test the model using on google colab, since the size is large am unable to download it completely. here is the error I am facing
and without those files the model does it work.
how much space do I need to run the this Amharic model? any alternative way of using this model?