open-lilm-v2-q4
This is simply a quantized version of open-lilm-v2 without other modification. This model is only intended for research or entertainment purposes as the original model.
Warning: Due to the nature of the training data, this model is highly likely to return violent, racist and discriminative content. DO NOT USE IN PRODUCTION ENVIRONMENT.
Model Details
- Name: open-lilm-v2-q4
- Quantization: 4-bit quantization
- Base Model: 0xtaipoian/open-lilm-v2
Usage
This model can be used with Hugging Face's Transformers library:
from transformers import AutoTokenizer, AutoModelForCausalLM
model_name = "liemo/open-lilm-v2-q4"
model = AutoModelForCausalLM.from_pretrained(model_name)
tokenizer = AutoTokenizer.from_pretrained(model_name)
def chat(messages, temperature=0.9, max_new_tokens=200):
input_ids = tokenizer.apply_chat_template(conversation=messages, tokenize=True, add_generation_prompt=True, return_tensors='pt').to('cuda:0')
output_ids = quantized_model.generate(input_ids, max_new_tokens=max_new_tokens, temperature=temperature, do_sample=True)
chatml = tokenizer.apply_chat_template(messages, add_generation_prompt=True, tokenize=False)
print(chatml)
response = tokenizer.decode(output_ids[0][input_ids.shape[1]:], skip_special_tokens=False)
return response
messages = [
{"role": "user",
"content": """
INPUT_CONTENT_HERE
"""}
]
result = chat(messages, max_new_tokens=200, temperature=1)
print(result)
- Downloads last month
- 20
This model does not have enough activity to be deployed to Inference API (serverless) yet. Increase its social
visibility and check back later, or deploy to Inference Endpoints (dedicated)
instead.
Model tree for liemo/open-lilm-v2-q4
Base model
hon9kon9ize/CantoneseLLMChat-v0.5
Finetuned
0xtaipoian/open-lilm-v2