liemo
/

open-lilm-v2-q4

Text Generation

text-generation-inference

Inference Endpoints

4-bit precision

Model card Files Files and versions Community

open-lilm-v2-q4 / README.md

liemo's picture

Update README.md

c230707 verified 2 months ago

|

No virus

1.81 kB

	---
	license: apache-2.0
	pipeline_tag: text-generation
	tags:
	- text-generation
	- quantized
	base_model: 0xtaipoian/open-lilm-v2
	library_name: transformers
	widget:
	- text: "大學是三年好還是四年好？敢情是四年制好。大學不一定是學術自由的場所，還是戀愛轉型的金鐘中途站。"
	---
	# open-lilm-v2-q4

	This is simply a quantized version of open-lilm-v2 without other modification.
	This model is only intended for research or entertainment purposes as the original model.

	Warning: Due to the nature of the training data, this model is highly likely to return violent, racist and discriminative content. DO NOT USE IN PRODUCTION ENVIRONMENT.

	## Model Details

	- Name: open-lilm-v2-q4
	- Quantization: 4-bit quantization
	- Base Model: 0xtaipoian/open-lilm-v2

	## Usage

	This model can be used with Hugging Face's Transformers library:

	```python
	from transformers import AutoTokenizer, AutoModelForCausalLM

	model_name = "liemo/open-lilm-v2-q4"

	model = AutoModelForCausalLM.from_pretrained(model_name)
	tokenizer = AutoTokenizer.from_pretrained(model_name)

	def chat(messages, temperature=0.9, max_new_tokens=200):
	input_ids = tokenizer.apply_chat_template(conversation=messages, tokenize=True, add_generation_prompt=True, return_tensors='pt').to('cuda:0')
	output_ids = quantized_model.generate(input_ids, max_new_tokens=max_new_tokens, temperature=temperature, do_sample=True)

	chatml = tokenizer.apply_chat_template(messages, add_generation_prompt=True, tokenize=False)
	print(chatml)

	response = tokenizer.decode(output_ids[0][input_ids.shape[1]:], skip_special_tokens=False)

	return response

	messages = [
	{"role": "user",
	"content": """
	INPUT_CONTENT_HERE
	"""}
	]

	result = chat(messages, max_new_tokens=200, temperature=1)
	print(result)