|
--- |
|
license: llama3 |
|
language: |
|
- ne |
|
library_name: transformers |
|
base_model: unsloth/llama-3-8b-bnb-4bit |
|
tags: |
|
- unsloth |
|
- pytorch |
|
- llama-3 |
|
- conversational |
|
--- |
|
|
|
This model is the initial test version, finetuned using LLaMA-3-8B version provided by UnslothAI in Nepali Language. |
|
|
|
## Model Details |
|
|
|
Directly quantized 4bit model with bitsandbytes. Built with Meta Llama 3. By UnslothAI. |
|
|
|
- **Developed by:** Norden Ghising Tamang under DarviLab Pvt. Ltd |
|
- **Model type:** Transformer-based language model |
|
- **Language(s) (NLP):** Nepali |
|
- **License:** A custom commercial license is available at: https://llama.meta.com/llama3/license |
|
|
|
## How To Use |
|
|
|
### Using HuggingFace's AutoModelForPeftCausalLM |
|
|
|
```python |
|
from peft import AutoPeftModelForCausalLM |
|
from transformers import AutoTokenizer |
|
model = AutoPeftModelForCausalLM.from_pretrained( |
|
"nordenxgt/nelm-chat-unsloth-llama3-v.0.0.1" |
|
load_in_4bit=True |
|
) |
|
tokenizer = AutoTokenizer.from_pretrained("nordenxgt/nelm-chat-unsloth-llama3-v.0.0.1") |
|
``` |
|
|
|
### Using UnslothAI [x2 Faster Inference] |
|
|
|
```python |
|
from unsloth import FastLanguageModel |
|
model, tokenizer = FastLanguageModel.from_pretrained( |
|
model_name="nordenxgt/nelm-chat-unsloth-llama3-v.0.0.1", |
|
max_seq_length=2048, |
|
dtype=None, |
|
load_in_4bit=True, |
|
) |
|
FastLanguageModel.for_inference(model) |
|
``` |
|
|
|
```python |
|
alpaca_prompt = """Below is an instruction that describes a task, paired with an input that provides further context. Write a response that appropriately completes the request. |
|
|
|
### Instruction: |
|
{} |
|
|
|
### Input: |
|
{} |
|
|
|
### Response: |
|
{}""" |
|
|
|
inputs = tokenizer( |
|
[ |
|
alpaca_prompt.format( |
|
"गौतम बुद्धको जन्म कुन देशमा भएको थियो?" # instruction |
|
"", # input |
|
"", # output - leave this blank for generation! |
|
) |
|
], return_tensors = "pt").to("cuda") |
|
|
|
outputs = model.generate(**inputs, max_new_tokens=64, use_cache=True) |
|
tokenizer.batch_decode(outputs) |
|
``` |