nordenxgt
/

nelm-chat-unsloth-llama3-v.0.0.1

Inference Endpoints

Model card Files Files and versions Community

nordenxgt commited on Jun 27, 2024

Commit

11a59de

·

verified ·

1 Parent(s): 2df3cbc

Update README.md

Files changed (1) hide show

README.md +53 -1

README.md CHANGED Viewed

@@ -20,4 +20,56 @@ Directly quantized 4bit model with bitsandbytes. Built with Meta Llama 3. By Uns
 - **Developed by:** Norden Ghising Tamang under DarviLab Pvt. Ltd
 - **Model type:** Transformer-based language model
 - **Language(s) (NLP):** Nepali
-- **License:** A custom commercial license is available at: https://llama.meta.com/llama3/license

 - **Developed by:** Norden Ghising Tamang under DarviLab Pvt. Ltd
 - **Model type:** Transformer-based language model
 - **Language(s) (NLP):** Nepali
+- **License:** A custom commercial license is available at: https://llama.meta.com/llama3/license
+## How To Use
+### Using HuggingFace's AutoModelForPeftCausalLM
+```python
+from peft import AutoPeftModelForCausalLM
+from transformers import AutoTokenizer
+model = AutoPeftModelForCausalLM.from_pretrained(
+    "nordenxgt/nelm-chat-unsloth-llama3-v.0.0.1"
+    load_in_4bit=True
+)
+tokenizer = AutoTokenizer.from_pretrained("nordenxgt/nelm-chat-unsloth-llama3-v.0.0.1")
+```
+### Using UnslothAI [x2 Faster Inference]
+```python
+from unsloth import FastLanguageModel
+model, tokenizer = FastLanguageModel.from_pretrained(
+    model_name="nordenxgt/nelm-chat-unsloth-llama3-v.0.0.1",
+    max_seq_length=2048,
+    dtype=None,
+    load_in_4bit=True,
+)
+FastLanguageModel.for_inference(model)
+```
+```python
+alpaca_prompt = """Below is an instruction that describes a task, paired with an input that provides further context. Write a response that appropriately completes the request.
+### Instruction:
+{}
+### Input:
+{}
+### Response:
+{}"""
+inputs = tokenizer(
+[
+    alpaca_prompt.format(
+        "गौतम बुद्धको जन्म कुन देशमा भएको थियो?  # instruction
+        "", # input
+        "", # output - leave this blank for generation!
+    )
+], return_tensors = "pt").to("cuda")
+outputs = model.generate(**inputs, max_new_tokens=64, use_cache=True)
+tokenizer.batch_decode(outputs)
+```