|
--- |
|
license: apache-2.0 |
|
pipeline_tag: text-generation |
|
tags: |
|
- text-generation |
|
- quantized |
|
base_model: 0xtaipoian/open-lilm-v2 |
|
library_name: transformers |
|
widget: |
|
- text: "大學是三年好還是四年好?敢情是四年制好。大學不一定是學術自由的場所,還是戀愛轉型的金鐘中途站。" |
|
--- |
|
# open-lilm-v2-q4 |
|
|
|
This is simply a quantized version of open-lilm-v2 without other modification. |
|
This model is only intended for research or entertainment purposes as the original model. |
|
|
|
Warning: Due to the nature of the training data, this model is highly likely to return violent, racist and discriminative content. DO NOT USE IN PRODUCTION ENVIRONMENT. |
|
|
|
## Model Details |
|
|
|
- Name: open-lilm-v2-q4 |
|
- Quantization: 4-bit quantization |
|
- Base Model: 0xtaipoian/open-lilm-v2 |
|
|
|
## Usage |
|
|
|
This model can be used with Hugging Face's Transformers library: |
|
|
|
```python |
|
from transformers import AutoTokenizer, AutoModelForCausalLM |
|
|
|
model_name = "liemo/open-lilm-v2-q4" |
|
|
|
model = AutoModelForCausalLM.from_pretrained(model_name) |
|
tokenizer = AutoTokenizer.from_pretrained(model_name) |
|
|
|
def chat(messages, temperature=0.9, max_new_tokens=200): |
|
input_ids = tokenizer.apply_chat_template(conversation=messages, tokenize=True, add_generation_prompt=True, return_tensors='pt').to('cuda:0') |
|
output_ids = quantized_model.generate(input_ids, max_new_tokens=max_new_tokens, temperature=temperature, do_sample=True) |
|
|
|
chatml = tokenizer.apply_chat_template(messages, add_generation_prompt=True, tokenize=False) |
|
print(chatml) |
|
|
|
response = tokenizer.decode(output_ids[0][input_ids.shape[1]:], skip_special_tokens=False) |
|
|
|
return response |
|
|
|
messages = [ |
|
{"role": "user", |
|
"content": """ |
|
INPUT_CONTENT_HERE |
|
"""} |
|
] |
|
|
|
result = chat(messages, max_new_tokens=200, temperature=1) |
|
print(result) |