Update README.md

3db040b verified 29 days ago

5.96 kB

	---
	library_name: transformers
	tags:
	- medical-qa
	- healthcare
	- llama
	- fine-tuned
	- llama-cpp
	- gguf-my-repo
	license: llama3.2
	datasets:
	- ruslanmv/ai-medical-chatbot
	base_model: Ellbendls/llama-3.2-3b-chat-doctor
	---

	# Triangle104/llama-3.2-3b-chat-doctor-Q5_K_M-GGUF
	This model was converted to GGUF format from [`Ellbendls/llama-3.2-3b-chat-doctor`](https://huggingface.co/Ellbendls/llama-3.2-3b-chat-doctor) using llama.cpp via the ggml.ai's [GGUF-my-repo](https://huggingface.co/spaces/ggml-org/gguf-my-repo) space.
	Refer to the [original model card](https://huggingface.co/Ellbendls/llama-3.2-3b-chat-doctor) for more details on the model.

	---
	Model details:
	-
	Llama-3.2-3B-Chat-Doctor is a specialized medical question-answering model based on the Llama 3.2 3B architecture. This model has been fine-tuned specifically for providing accurate and helpful responses to medical-related queries.

	Developed by: Ellbendl Satria
	Model type: Language Model (Conversational AI)
	Language: English
	Base Model: Meta Llama-3.2-3B-Instruct
	Model Size: 3 Billion Parameters
	Specialization: Medical Question Answering
	License: llama3.2

	Model Capabilities

	Provides informative responses to medical questions
	Assists in understanding medical terminology and health-related concepts
	Offers preliminary medical information (not a substitute for professional medical advice)

	Direct Use

	This model can be used for:

	Providing general medical information
	Explaining medical conditions and symptoms
	Offering basic health-related guidance
	Supporting medical education and patient communication

	Limitations and Important Disclaimers

	⚠️ CRITICAL WARNINGS:

	NOT A MEDICAL PROFESSIONAL: This model is NOT a substitute for professional medical advice, diagnosis, or treatment.
	Always consult a qualified healthcare provider for medical concerns.
	The model's responses should be treated as informational only and not as medical recommendations.

	Out-of-Scope Use

	The model SHOULD NOT be used for:

	Providing emergency medical advice
	Diagnosing specific medical conditions
	Replacing professional medical consultation
	Making critical healthcare decisions

	Bias, Risks, and Limitations
	Potential Biases

	May reflect biases present in the training data
	Responses might not account for individual patient variations
	Limited by the comprehensiveness of the training dataset

	Technical Limitations

	Accuracy is limited to the knowledge in the training data
	May not capture the most recent medical research or developments
	Cannot perform physical examinations or medical tests

	Recommendations

	Always verify medical information with professional healthcare providers
	Use the model as a supplementary information source
	Be aware of potential inaccuracies or incomplete information

	Training Details
	Training Data

	Source Dataset: ruslanmv/ai-medical-chatbot
	Base Model: Meta Llama-3.2-3B-Instruct

	Training Procedure

	[Provide details about the fine-tuning process, if available]

	Fine-tuning approach
	Computational resources used
	Training duration
	Specific techniques applied during fine-tuning

	How to Use the Model
	Hugging Face Transformers

	from transformers import AutoModelForCausalLM, AutoTokenizer

	model_name = "Ellbendls/llama-3.2-3b-chat-doctor"
	tokenizer = AutoTokenizer.from_pretrained(model_name)
	model = AutoModelForCausalLM.from_pretrained(model_name)

	# Example usage
	input_text = "I had a surgery which ended up with some failures. What can I do to fix it?"

	# Prepare inputs with explicit padding and attention mask
	inputs = tokenizer(input_text, return_tensors="pt", padding=True, truncation=True)

	# Generate response with more explicit parameters
	outputs = model.generate(
	input_ids=inputs['input_ids'],
	attention_mask=inputs['attention_mask'],
	max_new_tokens=150, # Specify max new tokens to generate
	do_sample=True, # Enable sampling for more diverse responses
	temperature=0.7, # Control randomness of output
	top_p=0.9, # Nucleus sampling to maintain quality
	num_return_sequences=1 # Number of generated sequences
	)

	# Decode the generated response
	response = tokenizer.decode(outputs[0], skip_special_tokens=True)

	print(response)

	Ethical Considerations

	This model is developed with the intent to provide helpful, accurate, and responsible medical information. Users are encouraged to:

	Use the model responsibly
	Understand its limitations
	Seek professional medical advice for serious health concerns

	---
	## Use with llama.cpp
	Install llama.cpp through brew (works on Mac and Linux)

	```bash
	brew install llama.cpp

	```
	Invoke the llama.cpp server or the CLI.

	### CLI:
	```bash
	llama-cli --hf-repo Triangle104/llama-3.2-3b-chat-doctor-Q5_K_M-GGUF --hf-file llama-3.2-3b-chat-doctor-q5_k_m.gguf -p "The meaning to life and the universe is"
	```

	### Server:
	```bash
	llama-server --hf-repo Triangle104/llama-3.2-3b-chat-doctor-Q5_K_M-GGUF --hf-file llama-3.2-3b-chat-doctor-q5_k_m.gguf -c 2048
	```

	Note: You can also use this checkpoint directly through the [usage steps](https://github.com/ggerganov/llama.cpp?tab=readme-ov-file#usage) listed in the Llama.cpp repo as well.

	Step 1: Clone llama.cpp from GitHub.
	```
	git clone https://github.com/ggerganov/llama.cpp
	```

	Step 2: Move into the llama.cpp folder and build it with `LLAMA_CURL=1` flag along with other hardware-specific flags (for ex: LLAMA_CUDA=1 for Nvidia GPUs on Linux).
	```
	cd llama.cpp && LLAMA_CURL=1 make
	```

	Step 3: Run inference through the main binary.
	```
	./llama-cli --hf-repo Triangle104/llama-3.2-3b-chat-doctor-Q5_K_M-GGUF --hf-file llama-3.2-3b-chat-doctor-q5_k_m.gguf -p "The meaning to life and the universe is"
	```
	or
	```
	./llama-server --hf-repo Triangle104/llama-3.2-3b-chat-doctor-Q5_K_M-GGUF --hf-file llama-3.2-3b-chat-doctor-q5_k_m.gguf -c 2048
	```