metadata
library_name: peft
base_model: yahma/llama-7b-hf
language:
- en
pipeline_tag: text-generation
tags:
- text-generation-inference
About :
AlpaRA 7B, a model for medical dialogue understanding. Fine-tuned using the Alpaca configuration on a curated 5,000-instruction dataset capturing nuances in patient-doctor conversations. Use Parameter Efficient Fine Tuning (PEFT) and Low Rank Adaptation (LoRA), make this model efficient on consumer-grade GPUs.
How to Use :
Load the AlpaRA model
from peft import PeftModel
from transformers import LlamaTokenizer, LlamaForCausalLM, GenerationConfig
tokenizer = LlamaTokenizer.from_pretrained("yahma/llama-7b-hf")
model = LlamaForCausalLM.from_pretrained(
"yahma/llama-7b-hf",
load_in_8bit=True,
device_map="auto"
)
model = PeftModel.from_pretrained(model, "KalbeDigitalLab/alpara-7b-peft")
Prompt Template :
Feel free to change the instruction
PROMPT = """Below is an instruction that describes a task. Write a response that appropriately completes the request.
### Instruction:
"how to cure flu?"
### Response:"""
Evaluation
inputs = tokenizer(
PROMPT,
return_tensors="pt"
)
input_ids = inputs["input_ids"].cuda()
print("Generating...")
generation_output = model.generate(
input_ids=input_ids,
return_dict_in_generate=True,
output_scores=True,
max_new_tokens=512,
)
for s in generation_output.sequences:
result = tokenizer.decode(s).split("### Response:")[1]
print(result)