Uploaded model
- Developed by: ArnabAr
- License: apache-2.0
- Finetuned from model : unsloth/mistral-7b-bnb-4bit
Use the model:
from unsloth import FastLanguageModel
max_seq_length = 4096 # Choose any! We auto support RoPE Scaling internally!
dtype = None # None for auto detection. Float16 for Tesla T4, V100, Bfloat16 for Ampere+
load_in_4bit = True # Use 4bit quantization to reduce memory usage. Can be False.
Load the model
model, tokenizer = FastLanguageModel.from_pretrained( model_name = "ArnabAr/SFT_Lora_Mistral_700_Data_V1", # YOUR MODEL YOU USED FOR TRAINING max_seq_length = max_seq_length, dtype = dtype, load_in_4bit = load_in_4bit, )
FastLanguageModel.for_inference(model) # Enable native 2x faster inference
messages = [ {"from": "human", "value": " Ask you queries. " }, ]
Prepare input as a single string if the model expects plain text
input_text = messages[0]["value"]
inputs = tokenizer(input_text, return_tensors="pt").to("cuda")
Generate response
output = model.generate( input_ids=inputs["input_ids"], max_new_tokens=max_seq_length, use_cache=True ) response = tokenizer.decode(output[0], skip_special_tokens=True)
print(response)
This mistral model was trained 2x faster with Unsloth and Huggingface's TRL library.
Model tree for ArnabAr/SFT_Lora_Mistral_700_Data_V1
Base model
unsloth/mistral-7b-bnb-4bit