Edit model card

open-lilm-v2-q4

This is simply a quantized version of open-lilm-v2 without other modification. This model is only intended for research or entertainment purposes as the original model.

Warning: Due to the nature of the training data, this model is highly likely to return violent, racist and discriminative content. DO NOT USE IN PRODUCTION ENVIRONMENT.

Model Details

  • Name: open-lilm-v2-q4
  • Quantization: 4-bit quantization
  • Base Model: 0xtaipoian/open-lilm-v2

Usage

This model can be used with Hugging Face's Transformers library:

from transformers import AutoTokenizer, AutoModelForCausalLM

model_name = "liemo/open-lilm-v2-q4"

model = AutoModelForCausalLM.from_pretrained(model_name)
tokenizer = AutoTokenizer.from_pretrained(model_name)

def chat(messages, temperature=0.9, max_new_tokens=200):
    input_ids = tokenizer.apply_chat_template(conversation=messages, tokenize=True, add_generation_prompt=True, return_tensors='pt').to('cuda:0')
    output_ids = quantized_model.generate(input_ids, max_new_tokens=max_new_tokens, temperature=temperature, do_sample=True)

    chatml = tokenizer.apply_chat_template(messages, add_generation_prompt=True, tokenize=False)
    print(chatml)

    response = tokenizer.decode(output_ids[0][input_ids.shape[1]:], skip_special_tokens=False)

    return response

messages = [
    {"role": "user",
     "content": """
    INPUT_CONTENT_HERE
     """}
]

result = chat(messages, max_new_tokens=200, temperature=1)
print(result)
Downloads last month
20
Safetensors
Model size
3.39B params
Tensor type
F32
·
FP16
·
U8
·
Inference Examples
This model does not have enough activity to be deployed to Inference API (serverless) yet. Increase its social visibility and check back later, or deploy to Inference Endpoints (dedicated) instead.

Model tree for liemo/open-lilm-v2-q4

Quantized
(1)
this model