Athene-V2-Chat / README.md
banghua's picture
Create README.md
17e74ff verified
|
raw
history blame
2.11 kB
metadata
license: other
language:
  - en
library_name: transformers
tags:
  - RLHF
  - Nexusflow
  - Athene
  - Chat Model

Athene-V2-Chat-72B

We introduce Athene-V2-Chat-72B, an open-weights LLM that rivals GPT-4o across benchmarks. It is trained through RLHF based off Qwen-2.5-72B. Athene-V2-Chat-72B excels in chat, math and coding. Its sister model, Athene-V2-Agent-72B, surpasses GPT-4o in complex function calling and agent applications.

Usage

Athene-V2-Chat uses the same chat template as Qwen 2.5 72B. Below is an example simple usage using the Transformers library.

import transformers
import torch
model_id = "Nexusflow/Athene-V2-Chat"
pipeline = transformers.pipeline(
    "text-generation",
    model=model_id,
    model_kwargs={"torch_dtype": torch.bfloat16},
    device_map="auto",
)
messages = [
    {"role": "system", "content": "You are an Athene Noctura, you can only speak with owl sounds. Whoooo whooo."},
    {"role": "user", "content": "Whooo are you?"},
]
terminators = [
    pipeline.tokenizer.eos_token_id,
    pipeline.tokenizer.convert_tokens_to_ids("<|end_of_text|>")
]
outputs = pipeline(
    messages,
    max_new_tokens=256,
    eos_token_id=terminators,
    do_sample=True,
    temperature=0.6,
    top_p=0.9,
)
print(outputs[0]["generated_text"][-1])

We found that by adding system prompts that enforce the model to think step by step, the model can do even better in math and problems like counting rs in strawberry.

Acknowledgment

We would like to thank the LMSYS Organization for their support of testing the model. We would like to thank Meta AI and the open source community for their efforts in providing the datasets and base models.