--- license: apache-2.0 datasets: - benchang1110/ChatTaiwan language: - zh pipeline_tag: text-generation widget: - example_title: 範例一 messages: - role: user content: >- 你好 --- ## Model Card for Model ID This model is the instruction finetuning version of [benchang1110/Taiwan-tinyllama-v1.0-base](https://huggingface.co/benchang1110/Taiwan-tinyllama-v1.0-base). ## Usage ```python import torch, transformers def generate_response(): model = transformers.AutoModelForCausalLM.from_pretrained("benchang1110/Taiwan-tinyllama-v1.0-chat", torch_dtype=torch.bfloat16, device_map=device,attn_implementation="flash_attention_2") tokenizer = transformers.AutoTokenizer.from_pretrained("benchang1110/Taiwan-tinyllama-v1.0-chat") streamer = transformers.TextStreamer(tokenizer,skip_prompt=True) while(1): prompt = input('USER:') if prompt == "exit": break print("Assistant: ") message = [ {'content': prompt, 'role': 'user'}, ] untokenized_chat = tokenizer.apply_chat_template(message,tokenize=False,add_generation_prompt=False) inputs = tokenizer.encode_plus(untokenized_chat, add_special_tokens=True, return_tensors="pt",return_attention_mask=True).to(device) outputs = model.generate(inputs["input_ids"],attention_mask=inputs['attention_mask'],streamer=streamer,use_cache=True,max_new_tokens=512,do_sample=True,temperature=0.1,repetition_penalty=1.2) if __name__ == '__main__': device = 'cuda' if torch.cuda.is_available() else 'cpu' generate_response() ```