Nexusflow
/

Athene-V2-Chat

Text Generation

text-generation-inference

Inference Endpoints

Model card Files Files and versions Community

banghua commited on Nov 14, 2024

Commit

4244167

·

verified ·

1 Parent(s): d7193d2

Update README.md

Files changed (1) hide show

README.md +27 -22

README.md CHANGED Viewed

@@ -34,35 +34,40 @@ Benchmark performance:
 ## Usage
 Athene-V2-Chat uses the same chat template as Qwen 2.5 72B. Below is an example simple usage using the Transformers library.
 ```Python
-import transformers
-import torch
-model_id = "Nexusflow/Athene-V2-Chat"
-pipeline = transformers.pipeline(
-    "text-generation",
-    model=model_id,
-    model_kwargs={"torch_dtype": torch.bfloat16},
-    device_map="auto",
 )
 messages = [
-    {"role": "system", "content": "You are an Athene Noctura, you can only speak with owl sounds. Whoooo whooo."},
-    {"role": "user", "content": "Whooo are you?"},
-]
-terminators = [
-    pipeline.tokenizer.eos_token_id,
-    pipeline.tokenizer.convert_tokens_to_ids("<|end_of_text|>")
 ]
-outputs = pipeline(
     messages,
-    max_new_tokens=256,
-    eos_token_id=terminators,
-    do_sample=True,
-    temperature=0.6,
-    top_p=0.9,
 )
-print(outputs[0]["generated_text"][-1])
 ```
 We found that by adding system prompts that enforce the model to think step by step, the model can do even better in math and problems like counting `r`s in strawberry. For fairness consideration we **do not** include such system prompt during chat evaluation.
 ## Acknowledgment
-We would like to thank the [LMSYS Organization](https://lmsys.org/) for their support of testing the model. We would like to thank Meta AI and the open source community for their efforts in providing the datasets and base models.

 ## Usage
 Athene-V2-Chat uses the same chat template as Qwen 2.5 72B. Below is an example simple usage using the Transformers library.
 ```Python
+from transformers import AutoModelForCausalLM, AutoTokenizer
+model_name = "Nexusflow/Athene-V2-Chat"
+model = AutoModelForCausalLM.from_pretrained(
+    model_name,
+    torch_dtype="auto",
+    device_map="auto"
 )
+tokenizer = AutoTokenizer.from_pretrained(model_name)
+prompt = "Give me a short introduction to large language model."
 messages = [
+    {"role": "user", "content": prompt}
 ]
+text = tokenizer.apply_chat_template(
     messages,
+    tokenize=False,
+    add_generation_prompt=True
 )
+model_inputs = tokenizer([text], return_tensors="pt").to(model.device)
+generated_ids = model.generate(
+    **model_inputs,
+    max_new_tokens=512
+)
+generated_ids = [
+    output_ids[len(input_ids):] for input_ids, output_ids in zip(model_inputs.input_ids, generated_ids)
+]
+response = tokenizer.batch_decode(generated_ids, skip_special_tokens=True)[0]
 ```
 We found that by adding system prompts that enforce the model to think step by step, the model can do even better in math and problems like counting `r`s in strawberry. For fairness consideration we **do not** include such system prompt during chat evaluation.
 ## Acknowledgment
+We would like to thank the [LMSYS Organization](https://lmsys.org/) for their support of testing the model. We would like to thank Qwen Team and the open source community for their efforts in providing the datasets and base models.