--- license: llama3 pipeline_tag: text-generation language: - ne library_name: transformers tags: - unsloth - pytorch - llama - llama-3 - conversational --- This model is the initial test version, finetuned using LLaMA-3-8B version provided by UnslothAI in Nepali Language. ## Model Details Directly quantized 4bit model with bitsandbytes. Built with Meta Llama 3. By UnslothAI. - **Developed by:** Norden Ghising Tamang under DarviLab Pvt. Ltd - **Model type:** Transformer-based language model - **Language(s) (NLP):** Nepali - **License:** A custom commercial license is available at: https://llama.meta.com/llama3/license ## How To Use ### Using HuggingFace's AutoModelForPeftCausalLM ```python from peft import AutoPeftModelForCausalLM from transformers import AutoTokenizer model = AutoPeftModelForCausalLM.from_pretrained( "nordenxgt/nelm-chat-unsloth-llama3-v.0.0.1" load_in_4bit=True ) tokenizer = AutoTokenizer.from_pretrained("nordenxgt/nelm-chat-unsloth-llama3-v.0.0.1") ``` ### Using UnslothAI [x2 Faster Inference] ```python from unsloth import FastLanguageModel model, tokenizer = FastLanguageModel.from_pretrained( model_name="nordenxgt/nelm-chat-unsloth-llama3-v.0.0.1", max_seq_length=2048, dtype=None, load_in_4bit=True, ) FastLanguageModel.for_inference(model) ``` ```python alpaca_prompt = """Below is an instruction that describes a task, paired with an input that provides further context. Write a response that appropriately completes the request. ### Instruction: {} ### Input: {} ### Response: {}""" inputs = tokenizer( [ alpaca_prompt.format( "गौतम बुद्धको जन्म कुन देशमा भएको थियो?" # instruction "", # input "", # output - leave this blank for generation! ) ], return_tensors = "pt").to("cuda") outputs = model.generate(**inputs, max_new_tokens=64, use_cache=True) tokenizer.batch_decode(outputs) ```