|
--- |
|
language: |
|
- zh |
|
license: apache-2.0 |
|
datasets: |
|
- benchang1110/ChatTaiwan |
|
pipeline_tag: text-generation |
|
widget: |
|
- example_title: 範例一 |
|
messages: |
|
- role: user |
|
content: 你好 |
|
--- |
|
## Model Card for Model ID |
|
|
|
This model is the instruction finetuning version of [benchang1110/Taiwan-tinyllama-v1.0-base](https://huggingface.co/benchang1110/Taiwan-tinyllama-v1.0-base). |
|
|
|
## Usage |
|
```python |
|
import torch, transformers |
|
|
|
def generate_response(): |
|
model = transformers.AutoModelForCausalLM.from_pretrained("benchang1110/Taiwan-tinyllama-v1.0-chat", torch_dtype=torch.bfloat16, device_map=device,attn_implementation="flash_attention_2") |
|
tokenizer = transformers.AutoTokenizer.from_pretrained("benchang1110/Taiwan-tinyllama-v1.0-chat") |
|
streamer = transformers.TextStreamer(tokenizer,skip_prompt=True) |
|
while(1): |
|
prompt = input('USER:') |
|
if prompt == "exit": |
|
break |
|
print("Assistant: ") |
|
message = [ |
|
{'content': prompt, 'role': 'user'}, |
|
] |
|
untokenized_chat = tokenizer.apply_chat_template(message,tokenize=False,add_generation_prompt=False) |
|
inputs = tokenizer.encode_plus(untokenized_chat, add_special_tokens=True, return_tensors="pt",return_attention_mask=True).to(device) |
|
outputs = model.generate(inputs["input_ids"],attention_mask=inputs['attention_mask'],streamer=streamer,use_cache=True,max_new_tokens=512,do_sample=True,temperature=0.1,repetition_penalty=1.2) |
|
|
|
|
|
if __name__ == '__main__': |
|
device = 'cuda' if torch.cuda.is_available() else 'cpu' |
|
generate_response() |
|
|
|
``` |