thesven
/

Llama3-8B-SFT-SyntheticMedical-bnb-4bit

Text Generation

text-generation-inference

Inference Endpoints

Model card Files Files and versions Community

Llama3-8B-SFT-SyntheticMedical-bnb-4bit / README.md

thesven's picture

Update README.md

464ea09 verified 8 months ago

|

1.94 kB

	---
	language:
	- en
	license: llama3
	library_name: transformers
	tags:
	- biology
	- medical
	datasets:
	- thesven/SyntheticMedicalQA-4336
	---

	# Llama3-8B-SFT-SyntheticMedical-bnb-4bit

	<!-- Provide a quick summary of what the model is/does. -->

	![image/png](https://cdn-uploads.huggingface.co/production/uploads/6324ce4d5d0cf5c62c6e3c5a/ZMeYpx2-wRbla__Tf6fvr.png)

	## Model Details

	### Model Description

	Llama3-8B-SFT-SSyntheticMedical-bnb-4bit is trained using the SFT method via QLoRA on 4336 rows of medical data to enhance it's abilities in the realm of scientific anatomy.

	This is the model card of a 🤗 transformers model that has been pushed on the Hub. This model card has been automatically generated.

	### Using the model with transformers

	```python
	from transformers import AutoModelForCausalLM, AutoTokenizer, BitsAndBytesConfig

	model_name_or_path = "thesven/Llama3-8B-SFT-SyntheticMedical-bnb-4bit"

	# BitsAndBytesConfig for loading the model in 4-bit precision
	bnb_config = BitsAndBytesConfig(
	load_in_4bit=True,
	bnb_4bit_quant_type="nf4",
	bnb_4bit_compute_dtype="float16",
	)

	tokenizer = AutoTokenizer.from_pretrained(model_name_or_path, use_fast=True)
	model = AutoModelForCausalLM.from_pretrained(
	model_name_or_path,
	device_map="auto",
	trust_remote_code=False,
	revision="main",
	quantization_config=bnb_config
	)
	model.pad_token = model.config.eos_token_id

	prompt_template = '''
	<\|begin_of_text\|><\|start_header_id\|>system<\|end_header_id\|>

	You are an expert in the field of anatomy, help explain its topics to me.<\|eot_id\|><\|start_header_id\|>user<\|end_header_id\|>

	What is the function of the hamstring?<\|eot_id\|><\|start_header_id\|>assistant<\|end_header_id\|>
	'''

	input_ids = tokenizer(prompt_template, return_tensors='pt').input_ids.cuda()
	output = model.generate(inputs=input_ids, temperature=0.1, do_sample=True, top_p=0.95, top_k=40, max_new_tokens=512)

	print(generated_text)

	```