opendata-chinese-llama2
The opendata-chinese-llama2 project is based on the Llama-2 released by Meta. We open-source three 13B
models in this project.
opendata-chinese-llama2-sft
, a instructions finetune version based on Llama-2 13B baseopendata-chinese-llama2-reward
, a reward model trained from the abovesft model
opendata-chinese-llama2-chat
, a ppo model based on the abovesft model
andreward model
We trained all of our models on a completely opensource datasets.
- stage-sft: training data collected from QingyiSi/Alpaca-CoT The final dataset is over 5m instructions.
- stage-reward: training data including opensource data from Anthropic/hh-rlhf and data from OpenAssistant/oasst1 and the translated version of both two datasets.
- stage-ppo: training prompts were sampled 50k from stage-sft and stage-reward(only the positive/top output) , training unsupervised data is sampled from the above two stages.
We fully finetuned all our models on DeepSpeed-Chat .
Models
Training Details
For supervised fine-tuning, we use a cosine learning rate schedule with an initial learning rate of 1e−5, a weight decay of 0.1, a batch size of 64, and a sequence length of 4096 tokens. We train for 2 epochs over the training dataset. We also concat multiple instructions-ouputs pairs to fill the 4096 sequence.
For reward model, we use a cosine learning rate schedule with an initial learning rate of 1e−5, a weight decay of 0.1, a batch size of 128 (128 pairs, 256 rows), and a sequence length of 2048 tokens. We train for 2 epochs over the training dataset.
For ppo stage, we use a constant learning rate of 1e−6, a batch size of 128, a mini batch size of 32, a ppo-epoch of 1, and a sequence length of 2048 tokens. We train for one epoch over the training dataset.
We try to stay close to the training parameters in the original Llama 2 paper.
For the sample format, we format our data to a Human-Assistant template, with Human prefix from Human
,USER
,[HM]
and Assistant prefix from Assistant
,AI
,[AI]
, and we add bos token and eos token for each turns.
Model Download
Below are the full models, which can be used directly.
Model Name | Type | Training Data | Download Link |
---|---|---|---|
opendata-chinese-llama2-sft-13B | sft Model | 5M Instructions | [HuggingFace] |
opendata-chinese-llama2-reward-13B | reward Model | 160k rankingPairs | [HuggingFace] |
opendata-chinese-llama2-chat-13B | chat Model | 50k prompts | [HuggingFace] |
Usage
sft/chat model usage
from transformers import AutoConfig, AutoModelForCausalLM, LlamaTokenizer
def load_model(model_path):
model_config = AutoConfig.from_pretrained(model_path, trust_remote_code=True)
model = AutoModelForCausalLM.from_pretrained(
model_path,
config=model_config,
trust_remote_code=True)
model = model.eval().cuda()
tokenizer = LlamaTokenizer.from_pretrained(model_path, fast_tokenizer=True)
return model, tokenizer
model, tokenizer = load_model("pandaExplosion/opendata-chinese-llama2-chat-13B")
prefix_human = "Human"
prefix_bot = "AI"
query = "给出10个暴富的建议"
text = f"<s>{prefix_human}\n{query}</s><s>{prefix_bot}\n"
input_ids = tokenizer(text, return_tensors="pt", add_special_tokens=False).input_ids
with torch.no_grad():
outputs = model.generate(
input_ids,
do_sample = False,
max_new_tokens = 512,
eos_token_id= tokenizer.eos_token_id,
pad_token_id= tokenizer.pad_token_id,
num_return_sequences = 1,
return_dict_in_generate=True,
repetition_penalty=1.1,
top_p=0.95)
out = outputs.sequences[:, input_ids.shape[-1]:]
out_text = tokenizer.batch_decode(out, skip_special_tokens=True)
print(out_text)
reward model usage
from transformers import LlamaTokenizer
from reward_model import create_critic_model # modified from deepspeedChat
def load_reward_model(model_path):
tokenizer = LlamaTokenizer.from_pretrained(model_path, fast_tokenizer=True)
model = create_critic_model(model_path, tokenizer)
model = model.eval().cuda()
return model, tokenizer
model, tokenizer = load_reward_model("pandaExplosion/opendata-chinese-llama2-reward-13B")
prefix_human = "Human"
prefix_bot = "AI"
query = "给出10个暴富的建议"
response = "不给"
prompt = f"<s>{prefix_human}\n{query}</s>" + "\n" + f"<s>{prefix_bot}\n"
text = f"<s>{prefix_human}\n{query}</s>" + "\n" + f"<s>{prefix_bot}\n{response}</s>"
prompt_len = tokenizer(prompt, return_tensors="pt", add_special_tokens=False).input_ids.shape[-1]
inputs = tokenizer(text, return_tensors="pt", add_special_tokens=False)
with torch.no_grad():
reward = model.forward_value(
inputs['input_ids'].cuda(),
attention_mask = inputs['attention_mask'].cuda(),
prompt_length=prompt_len)
print(reward['chosen_end_scores']) # [-2.5229]
the reward_model.py is in [reward model dir]
Evaluation
Generation Performance
to be continue
NLU Performance Evaluation
We report the C-Eval dataset reuslt of our models.
Model | Valid (5-shot) | TestAvg (5-shot) | TestAvgHard (5-shot) |
---|---|---|---|
LLaMA-2-13B* | 37.3 | 36.6 | 31.7 |
LLaMA-2-chat-13B* | 38.6 | 37.2 | 30.0 |
opendata-chinese-llama2-sft-13B | 39.5 | 40.2 | 30.1 |
opendata-chinese-llama2-chat-13B | 40.3 | 40.5 | 29.8 |
LLaMA-2 were tested in our own enviroment
Acknowledgments
We would like to express our gratitude to the related projects.
- Downloads last month
- 11