Download Model
The base-model microsoft/Phi-3-mini-4k-instruct currently relies on
the latest dev-version transformers and torch.
Also, it needs trust_remote_code=True as an argument of the from_pretrained function.
pip install git+https://github.com/huggingface/transformers accelerate
pip install --pre torch torchvision torchaudio --index-url https://download.pytorch.org/whl/nightly/cpu
Additionally, LoRA adapter requires the peft package.
pip install peft
Now, let's start to download the adapter.
import torch
from transformers import AutoModelForCausalLM, AutoTokenizer
model_id = "Mike0307/Phi-3-mini-4k-instruct-chinese-lora"
model = AutoModelForCausalLM.from_pretrained(
model_id,
device_map="mps", # mps is for MacOS users
torch_dtype=torch.float32, # try float16 if needed
trust_remote_code=True,
attn_implementation="eager", # without flash_attn
)
tokenizer = AutoTokenizer.from_pretrained(model_id)
Inference Example
# M2 pro takes about 3 seconds in this example.
input_text = "<|user|>將這五種動物分成兩組。\n老虎、鯊魚、大象、鯨魚、袋鼠 <|end|>\n<|assistant|>"
inputs = tokenizer(
input_text,
return_tensors="pt"
).to(torch.device("mps")) # mps is for MacOS users
outputs = model.generate(
**inputs,
temperature = 0.0,
max_length = 500,
do_sample = False
)
generated_text = tokenizer.decode(
outputs[0],
skip_special_tokens=True
)
print(generated_text)
Streaming Example
from transformers import TextStreamer
streamer = TextStreamer(tokenizer)
input_text = "<|user|>將這五種動物分成兩組。\n老虎、鯊魚、大象、鯨魚、袋鼠 <|end|>\n<|assistant|>"
inputs = tokenizer(
input_text,
return_tensors="pt"
).to(torch.device("mps")) # Change mps if not MacOS
outputs = model.generate(
**inputs,
temperature = 0.0,
do_sample = False,
streamer=streamer,
max_length=500,
)
generated_text = tokenizer.decode(
outputs[0],
skip_special_tokens=True
)
Example of RAG with Langchain
This reference shows how to customize langchain llm with this phi-3 lora model.
- Downloads last month
- 16
This model does not have enough activity to be deployed to Inference API (serverless) yet. Increase its social
visibility and check back later, or deploy to Inference Endpoints (dedicated)
instead.
Model tree for Mike0307/Phi-3-mini-4k-instruct-chinese-lora
Base model
microsoft/Phi-3-mini-4k-instruct