banghua commited on
Commit
4244167
·
verified ·
1 Parent(s): d7193d2

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +27 -22
README.md CHANGED
@@ -34,35 +34,40 @@ Benchmark performance:
34
  ## Usage
35
  Athene-V2-Chat uses the same chat template as Qwen 2.5 72B. Below is an example simple usage using the Transformers library.
36
  ```Python
37
- import transformers
38
- import torch
39
- model_id = "Nexusflow/Athene-V2-Chat"
40
- pipeline = transformers.pipeline(
41
- "text-generation",
42
- model=model_id,
43
- model_kwargs={"torch_dtype": torch.bfloat16},
44
- device_map="auto",
45
  )
 
 
 
46
  messages = [
47
- {"role": "system", "content": "You are an Athene Noctura, you can only speak with owl sounds. Whoooo whooo."},
48
- {"role": "user", "content": "Whooo are you?"},
49
- ]
50
- terminators = [
51
- pipeline.tokenizer.eos_token_id,
52
- pipeline.tokenizer.convert_tokens_to_ids("<|end_of_text|>")
53
  ]
54
- outputs = pipeline(
55
  messages,
56
- max_new_tokens=256,
57
- eos_token_id=terminators,
58
- do_sample=True,
59
- temperature=0.6,
60
- top_p=0.9,
61
  )
62
- print(outputs[0]["generated_text"][-1])
 
 
 
 
 
 
 
 
 
 
63
  ```
64
 
65
  We found that by adding system prompts that enforce the model to think step by step, the model can do even better in math and problems like counting `r`s in strawberry. For fairness consideration we **do not** include such system prompt during chat evaluation.
66
 
67
  ## Acknowledgment
68
- We would like to thank the [LMSYS Organization](https://lmsys.org/) for their support of testing the model. We would like to thank Meta AI and the open source community for their efforts in providing the datasets and base models.
 
34
  ## Usage
35
  Athene-V2-Chat uses the same chat template as Qwen 2.5 72B. Below is an example simple usage using the Transformers library.
36
  ```Python
37
+ from transformers import AutoModelForCausalLM, AutoTokenizer
38
+
39
+ model_name = "Nexusflow/Athene-V2-Chat"
40
+
41
+ model = AutoModelForCausalLM.from_pretrained(
42
+ model_name,
43
+ torch_dtype="auto",
44
+ device_map="auto"
45
  )
46
+ tokenizer = AutoTokenizer.from_pretrained(model_name)
47
+
48
+ prompt = "Give me a short introduction to large language model."
49
  messages = [
50
+ {"role": "user", "content": prompt}
 
 
 
 
 
51
  ]
52
+ text = tokenizer.apply_chat_template(
53
  messages,
54
+ tokenize=False,
55
+ add_generation_prompt=True
 
 
 
56
  )
57
+ model_inputs = tokenizer([text], return_tensors="pt").to(model.device)
58
+
59
+ generated_ids = model.generate(
60
+ **model_inputs,
61
+ max_new_tokens=512
62
+ )
63
+ generated_ids = [
64
+ output_ids[len(input_ids):] for input_ids, output_ids in zip(model_inputs.input_ids, generated_ids)
65
+ ]
66
+
67
+ response = tokenizer.batch_decode(generated_ids, skip_special_tokens=True)[0]
68
  ```
69
 
70
  We found that by adding system prompts that enforce the model to think step by step, the model can do even better in math and problems like counting `r`s in strawberry. For fairness consideration we **do not** include such system prompt during chat evaluation.
71
 
72
  ## Acknowledgment
73
+ We would like to thank the [LMSYS Organization](https://lmsys.org/) for their support of testing the model. We would like to thank Qwen Team and the open source community for their efforts in providing the datasets and base models.