metadata

library_name: peft
base_model: yahma/llama-7b-hf
language:
  - en
pipeline_tag: text-generation
tags:
  - text-generation-inference

About :

AlpaRA 7B, a model for medical dialogue understanding. Fine-tuned using the Alpaca configuration on a curated 5,000-instruction dataset capturing nuances in patient-doctor conversations. Use Parameter Efficient Fine Tuning (PEFT) and Low Rank Adaptation (LoRA), make this model efficient on consumer-grade GPUs.

How to Use :

Load the AlpaRA model

from peft import PeftModel
from transformers import LlamaTokenizer, LlamaForCausalLM, GenerationConfig

tokenizer = LlamaTokenizer.from_pretrained("yahma/llama-7b-hf")

model = LlamaForCausalLM.from_pretrained(
    "yahma/llama-7b-hf",
    load_in_8bit=True,
    device_map="auto"
)
model = PeftModel.from_pretrained(model, "KalbeDigitalLab/alpara-7b-peft")

Prompt Template :

Feel free to change the instruction

PROMPT = """Below is an instruction that describes a task. Write a response that appropriately completes the request.


### Instruction:
"how to cure flu?"

### Response:"""

Evaluation

inputs = tokenizer(
    PROMPT,
    return_tensors="pt"
)
input_ids = inputs["input_ids"].cuda()

print("Generating...")
generation_output = model.generate(
    input_ids=input_ids,
    return_dict_in_generate=True,
    output_scores=True,
    max_new_tokens=512,
)
for s in generation_output.sequences:
    result = tokenizer.decode(s).split("### Response:")[1]
    print(result)