File size: 2,669 Bytes
57cf84a
 
 
 
 
b3d5fef
57cf84a
 
b3d5fef
 
57cf84a
96610ae
 
57cf84a
ab895f1
e7132ef
735cb24
d65a0c1
a418818
 
f149a16
ab895f1
98d5593
b3d5fef
 
 
f149a16
b3d5fef
 
 
 
f149a16
ab895f1
d65a0c1
 
 
 
4e7a567
 
 
96610ae
 
d65a0c1
7a3db86
d65a0c1
4e7a567
d65a0c1
 
735cb24
d65a0c1
 
b57a229
d65a0c1
735cb24
 
 
 
96610ae
d65a0c1
4e7a567
d65a0c1
 
 
 
 
 
735cb24
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
f710983
735cb24
 
 
 
 
 
 
fc69a83
 
 
 
 
96610ae
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
---
library_name: transformers
tags:
- trl
- sft
license: apache-2.0
datasets:
- Mike0307/alpaca-en-zhtw
language:
- zh
pipeline_tag: text-generation
base_model:
- microsoft/Phi-3-mini-4k-instruct
---


## Download Model

The base-model [microsoft/Phi-3-mini-4k-instruct](https://huggingface.co/microsoft/Phi-3-mini-4k-instruct) currently relies on 
the latest dev-version transformers and torch.<br>
Also, it needs *trust_remote_code=True* as an argument of the from_pretrained function.
```
pip install git+https://github.com/huggingface/transformers accelerate
pip install --pre torch torchvision torchaudio --index-url https://download.pytorch.org/whl/nightly/cpu
```

Additionally, LoRA adapter requires the peft package.
```
pip install peft
```

Now, let's start to download the adapter. 

```python
import torch
from transformers import AutoModelForCausalLM, AutoTokenizer

model_id = "Mike0307/Phi-3-mini-4k-instruct-chinese-lora"
model = AutoModelForCausalLM.from_pretrained(
    model_id, 
    device_map="mps", # mps is for MacOS users
    torch_dtype=torch.float32,  # try float16 if needed
    trust_remote_code=True,
    attn_implementation="eager", # without flash_attn
)
tokenizer = AutoTokenizer.from_pretrained(model_id)
```

## Inference Example

```python
# M2 pro takes about 3 seconds in this example.
input_text = "<|user|>將這五種動物分成兩組。\n老虎、鯊魚、大象、鯨魚、袋鼠 <|end|>\n<|assistant|>"

inputs = tokenizer(
    input_text, 
    return_tensors="pt"
).to(torch.device("mps")) # mps is for MacOS users

outputs = model.generate(
    **inputs, 
    temperature = 0.0,
    max_length = 500,
    do_sample = False
)

generated_text = tokenizer.decode(
    outputs[0], 
    skip_special_tokens=True
)
print(generated_text)
```


## Streaming Example
```python
from transformers import TextStreamer
streamer = TextStreamer(tokenizer)

input_text = "<|user|>將這五種動物分成兩組。\n老虎、鯊魚、大象、鯨魚、袋鼠 <|end|>\n<|assistant|>"

inputs = tokenizer(
    input_text, 
    return_tensors="pt"
).to(torch.device("mps")) # Change mps if not MacOS

outputs = model.generate(
    **inputs, 
    temperature = 0.0,
    do_sample = False,
    streamer=streamer,
    max_length=500,
)

generated_text = tokenizer.decode(
    outputs[0], 
    skip_special_tokens=True
)
```

## Example of RAG with Langchain

[This reference](https://huggingface.co/Mike0307/text2vec-base-chinese-rag#example-of-langchain-rag) shows how to customize langchain llm with this phi-3 lora model.

![image/png](https://cdn-uploads.huggingface.co/production/uploads/6414866f1cbd604c9217c7d0/RrBoHJINfrSWtCNkePs7g.png)