File size: 2,669 Bytes
57cf84a b3d5fef 57cf84a b3d5fef 57cf84a 96610ae 57cf84a ab895f1 e7132ef 735cb24 d65a0c1 a418818 f149a16 ab895f1 98d5593 b3d5fef f149a16 b3d5fef f149a16 ab895f1 d65a0c1 4e7a567 96610ae d65a0c1 7a3db86 d65a0c1 4e7a567 d65a0c1 735cb24 d65a0c1 b57a229 d65a0c1 735cb24 96610ae d65a0c1 4e7a567 d65a0c1 735cb24 f710983 735cb24 fc69a83 96610ae |
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 |
---
library_name: transformers
tags:
- trl
- sft
license: apache-2.0
datasets:
- Mike0307/alpaca-en-zhtw
language:
- zh
pipeline_tag: text-generation
base_model:
- microsoft/Phi-3-mini-4k-instruct
---
## Download Model
The base-model [microsoft/Phi-3-mini-4k-instruct](https://huggingface.co/microsoft/Phi-3-mini-4k-instruct) currently relies on
the latest dev-version transformers and torch.<br>
Also, it needs *trust_remote_code=True* as an argument of the from_pretrained function.
```
pip install git+https://github.com/huggingface/transformers accelerate
pip install --pre torch torchvision torchaudio --index-url https://download.pytorch.org/whl/nightly/cpu
```
Additionally, LoRA adapter requires the peft package.
```
pip install peft
```
Now, let's start to download the adapter.
```python
import torch
from transformers import AutoModelForCausalLM, AutoTokenizer
model_id = "Mike0307/Phi-3-mini-4k-instruct-chinese-lora"
model = AutoModelForCausalLM.from_pretrained(
model_id,
device_map="mps", # mps is for MacOS users
torch_dtype=torch.float32, # try float16 if needed
trust_remote_code=True,
attn_implementation="eager", # without flash_attn
)
tokenizer = AutoTokenizer.from_pretrained(model_id)
```
## Inference Example
```python
# M2 pro takes about 3 seconds in this example.
input_text = "<|user|>將這五種動物分成兩組。\n老虎、鯊魚、大象、鯨魚、袋鼠 <|end|>\n<|assistant|>"
inputs = tokenizer(
input_text,
return_tensors="pt"
).to(torch.device("mps")) # mps is for MacOS users
outputs = model.generate(
**inputs,
temperature = 0.0,
max_length = 500,
do_sample = False
)
generated_text = tokenizer.decode(
outputs[0],
skip_special_tokens=True
)
print(generated_text)
```
## Streaming Example
```python
from transformers import TextStreamer
streamer = TextStreamer(tokenizer)
input_text = "<|user|>將這五種動物分成兩組。\n老虎、鯊魚、大象、鯨魚、袋鼠 <|end|>\n<|assistant|>"
inputs = tokenizer(
input_text,
return_tensors="pt"
).to(torch.device("mps")) # Change mps if not MacOS
outputs = model.generate(
**inputs,
temperature = 0.0,
do_sample = False,
streamer=streamer,
max_length=500,
)
generated_text = tokenizer.decode(
outputs[0],
skip_special_tokens=True
)
```
## Example of RAG with Langchain
[This reference](https://huggingface.co/Mike0307/text2vec-base-chinese-rag#example-of-langchain-rag) shows how to customize langchain llm with this phi-3 lora model.
![image/png](https://cdn-uploads.huggingface.co/production/uploads/6414866f1cbd604c9217c7d0/RrBoHJINfrSWtCNkePs7g.png) |