Uploaded model

Developed by: ArnabAr
License: apache-2.0
Finetuned from model : unsloth/mistral-7b-bnb-4bit

Use the model:

from unsloth import FastLanguageModel

max_seq_length = 4096 # Choose any! We auto support RoPE Scaling internally!

dtype = None # None for auto detection. Float16 for Tesla T4, V100, Bfloat16 for Ampere+

load_in_4bit = True # Use 4bit quantization to reduce memory usage. Can be False.

Load the model

model, tokenizer = FastLanguageModel.from_pretrained( model_name = "ArnabAr/SFT_Lora_Mistral_700_Data_V1", # YOUR MODEL YOU USED FOR TRAINING max_seq_length = max_seq_length, dtype = dtype, load_in_4bit = load_in_4bit, )

FastLanguageModel.for_inference(model) # Enable native 2x faster inference

messages = [ {"from": "human", "value": " Ask you queries. " }, ]

Prepare input as a single string if the model expects plain text

input_text = messages[0]["value"]

inputs = tokenizer(input_text, return_tensors="pt").to("cuda")

Generate response

output = model.generate( input_ids=inputs["input_ids"], max_new_tokens=max_seq_length, use_cache=True ) response = tokenizer.decode(output[0], skip_special_tokens=True)

print(response)

This mistral model was trained 2x faster with Unsloth and Huggingface's TRL library.

ArnabAr
/

SFT_Lora_Mistral_700_Data_V1

Uploaded model

Use the model:

Load the model

Prepare input as a single string if the model expects plain text

Generate response

Model tree for ArnabAr/SFT_Lora_Mistral_700_Data_V1