evan-nexusflow commited on
Commit
4d7a1e9
·
verified ·
1 Parent(s): f12e910

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +11 -6
README.md CHANGED
@@ -16,8 +16,8 @@ tags:
16
  </p>
17
 
18
 
19
- We introduce Athene-V2-Chat-72B, an open-weights LLM that rivals GPT-4o across benchmarks. It is trained through RLHF based off Qwen-2.5-72B.
20
- Athene-V2-Chat-72B excels in chat, math and coding. Its sister model, [Athene-V2-Agent-72B](https://huggingface.co/Nexusflow/Athene-V2-Chat), surpasses GPT-4o in complex function calling and agent applications.
21
 
22
  Benchmark performance:
23
 
@@ -27,12 +27,13 @@ Benchmark performance:
27
 
28
  - **Developed by:** The Nexusflow Team
29
  - **Model type:** Chat Model
30
- - **Finetuned from model:** [Qwen 2.5 72B](https://huggingface.co/Qwen/Qwen2.5-72B-Instruct)
31
  - **License**: [Nexusflow Research License](https://huggingface.co/Nexusflow/Athene-V2-Chat/blob/main/Nexusflow_Research_License_.pdf)
32
  - **Blog**: https://nexusflow.ai/blogs/athene-V2
33
 
34
  ## Usage
35
  Athene-V2-Chat uses the same chat template as Qwen 2.5 72B. Below is an example simple usage using the Transformers library.
 
36
  ```Python
37
  from transformers import AutoModelForCausalLM, AutoTokenizer
38
 
@@ -45,21 +46,25 @@ model = AutoModelForCausalLM.from_pretrained(
45
  )
46
  tokenizer = AutoTokenizer.from_pretrained(model_name)
47
 
48
- prompt = "Give me a short introduction to large language model."
 
49
  messages = [
50
  {"role": "user", "content": prompt}
51
  ]
 
52
  text = tokenizer.apply_chat_template(
53
  messages,
54
  tokenize=False,
55
  add_generation_prompt=True
56
  )
 
57
  model_inputs = tokenizer([text], return_tensors="pt").to(model.device)
58
 
59
  generated_ids = model.generate(
60
  **model_inputs,
61
- max_new_tokens=512
62
  )
 
63
  generated_ids = [
64
  output_ids[len(input_ids):] for input_ids, output_ids in zip(model_inputs.input_ids, generated_ids)
65
  ]
@@ -67,7 +72,7 @@ generated_ids = [
67
  response = tokenizer.batch_decode(generated_ids, skip_special_tokens=True)[0]
68
  ```
69
 
70
- We found that by adding system prompts that enforce the model to think step by step, the model can do even better in math and problems like counting `r`s in strawberry. For fairness consideration we **do not** include such system prompt during chat evaluation.
71
 
72
  ## Acknowledgment
73
  We would like to thank the [LMSYS Organization](https://lmsys.org/) for their support of testing the model. We would like to thank Qwen Team and the open source community for their efforts in providing the datasets and base models.
 
16
  </p>
17
 
18
 
19
+ We introduce Athene-V2-Chat-72B, an open-weights LLM on-par with GPT-4o across benchmarks. It is trained through RLHF with Qwen-2.5-72B-Instruct as base model.
20
+ Athene-V2-Chat-72B excels in chat, math, and coding. Its sister model, [Athene-V2-Agent-72B](https://huggingface.co/Nexusflow/Athene-V2-Chat), surpasses GPT-4o in complex function calling and agentic applications.
21
 
22
  Benchmark performance:
23
 
 
27
 
28
  - **Developed by:** The Nexusflow Team
29
  - **Model type:** Chat Model
30
+ - **Finetuned from model:** [Qwen 2.5 72B-Instruct](https://huggingface.co/Qwen/Qwen2.5-72B-Instruct)
31
  - **License**: [Nexusflow Research License](https://huggingface.co/Nexusflow/Athene-V2-Chat/blob/main/Nexusflow_Research_License_.pdf)
32
  - **Blog**: https://nexusflow.ai/blogs/athene-V2
33
 
34
  ## Usage
35
  Athene-V2-Chat uses the same chat template as Qwen 2.5 72B. Below is an example simple usage using the Transformers library.
36
+
37
  ```Python
38
  from transformers import AutoModelForCausalLM, AutoTokenizer
39
 
 
46
  )
47
  tokenizer = AutoTokenizer.from_pretrained(model_name)
48
 
49
+ prompt = "Write a Python function to return the nth Fibonacci number in log n runtime."
50
+
51
  messages = [
52
  {"role": "user", "content": prompt}
53
  ]
54
+
55
  text = tokenizer.apply_chat_template(
56
  messages,
57
  tokenize=False,
58
  add_generation_prompt=True
59
  )
60
+
61
  model_inputs = tokenizer([text], return_tensors="pt").to(model.device)
62
 
63
  generated_ids = model.generate(
64
  **model_inputs,
65
+ max_new_tokens=2048
66
  )
67
+
68
  generated_ids = [
69
  output_ids[len(input_ids):] for input_ids, output_ids in zip(model_inputs.input_ids, generated_ids)
70
  ]
 
72
  response = tokenizer.batch_decode(generated_ids, skip_special_tokens=True)[0]
73
  ```
74
 
75
+ Note that by adding a system prompt that encourages the model to think step by step, the model can improve further on difficult math queries and problems like counting `r`s in strawberry. For fairness consideration we **do not** include such system prompt during chat evaluation.
76
 
77
  ## Acknowledgment
78
  We would like to thank the [LMSYS Organization](https://lmsys.org/) for their support of testing the model. We would like to thank Qwen Team and the open source community for their efforts in providing the datasets and base models.