ruslanmv
/

Medical-Llama3-v2-Q4_K_M-GGUF

ruslanmv/Medical-Llama3-v2

Inference Endpoints

Model card Files Files and versions Community

ruslanmv commited on Jun 30, 2024

Commit

d58b76d

·

verified ·

1 Parent(s): 0905571

Delete README.md

Files changed (1) hide show

README.md +0 -46

README.md DELETED Viewed

@@ -1,46 +0,0 @@
----
-tags:
-- gguf
-- llama.cpp
-- quantized
-- ruslanmv/Medical-Llama3-v2
-license: apache-2.0
----
-# ruslanmv/Medical-Llama3-v2-Q4_K_M-GGUF
-This model was converted to GGUF format from [`ruslanmv/Medical-Llama3-v2`](https://huggingface.co/ruslanmv/Medical-Llama3-v2) using llama.cpp via
-[Convert Model to GGUF](https://huggingface.co/spaces/ruslanmv/convert_to_gguf).
-**Key Features:**
-* Quantized for reduced file size (GGUF format)
-* Optimized for use with llama.cpp
-* Compatible with llama-server for efficient serving
-Refer to the [original model card](https://huggingface.co/ruslanmv/Medical-Llama3-v2) for more details on the base model.
-## Usage with llama.cpp
-**1. Install llama.cpp:**
-```bash
-brew install llama.cpp  # For macOS/Linux
-```
-**2. Run Inference:**
-**CLI:**
-```bash
-llama-cli --hf-repo ruslanmv/Medical-Llama3-v2-Q4_K_M-GGUF --hf-file Medical-Llama3-v2-Q4_K_M-GGUF-4bit.gguf -p "Your prompt here"
-```
-**Server:**
-```bash
-llama-server --hf-repo ruslanmv/Medical-Llama3-v2-Q4_K_M-GGUF --hf-file Medical-Llama3-v2-Q4_K_M-GGUF-4bit.gguf -c 2048
-```
-For more advanced usage, refer to the [llama.cpp repository](https://github.com/ggerganov/llama.cpp).