0xtaipoian
/

open-lilm

Text Generation

Generated from Trainer

text-generation-inference

Inference Endpoints

Model card Files Files and versions Community

0xtaipoian commited on Jul 21, 2024

Commit

01c232d

·

verified ·

1 Parent(s): b9a0edf

Update README.md

Files changed (1) hide show

README.md +8 -3

README.md CHANGED Viewed

@@ -42,9 +42,14 @@ The comments on LIHKG also tend to be very short. Thus the model cannot generate
 ## How to use it?
 You can run it on [Colab](https://colab.research.google.com/drive/1FgdwkkPcLzn_x1ohgzJCA1xZ4MTesC_8?usp=sharing) or anywhere you want based on the code:
 ```python
-from transformers import AutoModelForCausalLM, AutoTokenizer, BitsAndBytesConfig, LlamaTokenizer
 from peft import PeftModel, PeftMixedModel
 model_name = "0xtaipoian/open-lilm"
@@ -54,16 +59,16 @@ bnb_config = BitsAndBytesConfig(
     bnb_4bit_quant_type="nf4",
     bnb_4bit_compute_dtype=torch.bfloat16
 )
 model = AutoModelForCausalLM.from_pretrained(
   model_name,
   torch_dtype=torch.bfloat16,
   device_map='auto',
   trust_remote_code=True,
   quantization_config=bnb_config,
 )
-tokenizer = AutoTokenizer.from_pretrained(model_name)
 def chat(messages, temperature=0.9, max_new_tokens=200):
     input_ids = tokenizer.apply_chat_template(conversation=messages, tokenize=True, add_generation_prompt=True, return_tensors='pt').to('cuda:0')

 ## How to use it?
 You can run it on [Colab](https://colab.research.google.com/drive/1FgdwkkPcLzn_x1ohgzJCA1xZ4MTesC_8?usp=sharing) or anywhere you want based on the code:
 ```python
+from transformers import AutoModelForCausalLM, AutoTokenizer, BitsAndBytesConfig, LlamaTokenizer, GenerationConfig, pipeline
 from peft import PeftModel, PeftMixedModel
+import torch
+import pprint
+# enable torch CUDA tf32
+torch.backends.cudnn.allow_tf32 = True
 model_name = "0xtaipoian/open-lilm"
     bnb_4bit_quant_type="nf4",
     bnb_4bit_compute_dtype=torch.bfloat16
 )
+tokenizer = AutoTokenizer.from_pretrained(model_name)
 model = AutoModelForCausalLM.from_pretrained(
   model_name,
   torch_dtype=torch.bfloat16,
   device_map='auto',
   trust_remote_code=True,
   quantization_config=bnb_config,
+  revision="main", #qlora-merged (qLoRA finetuned for 3 epochs) or main (full parameter finetune for 1 epoch)
 )
 def chat(messages, temperature=0.9, max_new_tokens=200):
     input_ids = tokenizer.apply_chat_template(conversation=messages, tokenize=True, add_generation_prompt=True, return_tensors='pt').to('cuda:0')