|
--- |
|
library_name: transformers |
|
tags: |
|
- trl |
|
- sft |
|
license: apache-2.0 |
|
datasets: |
|
- Mike0307/alpaca-en-zhtw |
|
language: |
|
- zh |
|
pipeline_tag: text-generation |
|
base_model: |
|
- microsoft/Phi-3-mini-4k-instruct |
|
--- |
|
|
|
|
|
## Download Model |
|
|
|
The base-model [microsoft/Phi-3-mini-4k-instruct](https://huggingface.co/microsoft/Phi-3-mini-4k-instruct) currently relies on |
|
the latest dev-version transformers and torch.<br> |
|
Also, it needs *trust_remote_code=True* as an argument of the from_pretrained function. |
|
``` |
|
pip install git+https://github.com/huggingface/transformers accelerate |
|
pip install --pre torch torchvision torchaudio --index-url https://download.pytorch.org/whl/nightly/cpu |
|
``` |
|
|
|
Additionally, LoRA adapter requires the peft package. |
|
``` |
|
pip install peft |
|
``` |
|
|
|
Now, let's start to download the adapter. |
|
|
|
```python |
|
import torch |
|
from transformers import AutoModelForCausalLM, AutoTokenizer |
|
|
|
model_id = "Mike0307/Phi-3-mini-4k-instruct-chinese-lora" |
|
model = AutoModelForCausalLM.from_pretrained( |
|
model_id, |
|
device_map="mps", # mps is for MacOS users |
|
torch_dtype=torch.float32, # try float16 if needed |
|
trust_remote_code=True, |
|
attn_implementation="eager", # without flash_attn |
|
) |
|
tokenizer = AutoTokenizer.from_pretrained(model_id) |
|
``` |
|
|
|
## Inference Example |
|
|
|
```python |
|
# M2 pro takes about 3 seconds in this example. |
|
input_text = "<|user|>將這五種動物分成兩組。\n老虎、鯊魚、大象、鯨魚、袋鼠 <|end|>\n<|assistant|>" |
|
|
|
inputs = tokenizer( |
|
input_text, |
|
return_tensors="pt" |
|
).to(torch.device("mps")) # mps is for MacOS users |
|
|
|
outputs = model.generate( |
|
**inputs, |
|
temperature = 0.0, |
|
max_length = 500, |
|
do_sample = False |
|
) |
|
|
|
generated_text = tokenizer.decode( |
|
outputs[0], |
|
skip_special_tokens=True |
|
) |
|
print(generated_text) |
|
``` |
|
|
|
|
|
## Streaming Example |
|
```python |
|
from transformers import TextStreamer |
|
streamer = TextStreamer(tokenizer) |
|
|
|
input_text = "<|user|>將這五種動物分成兩組。\n老虎、鯊魚、大象、鯨魚、袋鼠 <|end|>\n<|assistant|>" |
|
|
|
inputs = tokenizer( |
|
input_text, |
|
return_tensors="pt" |
|
).to(torch.device("mps")) # Change mps if not MacOS |
|
|
|
outputs = model.generate( |
|
**inputs, |
|
temperature = 0.0, |
|
do_sample = False, |
|
streamer=streamer, |
|
max_length=500, |
|
) |
|
|
|
generated_text = tokenizer.decode( |
|
outputs[0], |
|
skip_special_tokens=True |
|
) |
|
``` |
|
|
|
## Example of RAG with Langchain |
|
|
|
[This reference](https://huggingface.co/Mike0307/text2vec-base-chinese-rag#example-of-langchain-rag) shows how to customize langchain llm with this phi-3 lora model. |
|
|
|
![image/png](https://cdn-uploads.huggingface.co/production/uploads/6414866f1cbd604c9217c7d0/RrBoHJINfrSWtCNkePs7g.png) |