opendata-chinese-llama2

The opendata-chinese-llama2 project is based on the Llama-2 released by Meta. We open-source three 13B models in this project.

  • opendata-chinese-llama2-sft, a instructions finetune version based on Llama-2 13B base
  • opendata-chinese-llama2-reward, a reward model trained from the above sft model
  • opendata-chinese-llama2-chat, a ppo model based on the above sft model and reward model

We trained all of our models on a completely opensource datasets.

  • stage-sft: training data collected from QingyiSi/Alpaca-CoT The final dataset is over 5m instructions.
  • stage-reward: training data including opensource data from Anthropic/hh-rlhf and data from OpenAssistant/oasst1 and the translated version of both two datasets.
  • stage-ppo: training prompts were sampled 50k from stage-sft and stage-reward(only the positive/top output) , training unsupervised data is sampled from the above two stages.

We fully finetuned all our models on DeepSpeed-Chat .


Models

Training Details

For supervised fine-tuning, we use a cosine learning rate schedule with an initial learning rate of 1e−5, a weight decay of 0.1, a batch size of 64, and a sequence length of 4096 tokens. We train for 2 epochs over the training dataset. We also concat multiple instructions-ouputs pairs to fill the 4096 sequence.

For reward model, we use a cosine learning rate schedule with an initial learning rate of 1e−5, a weight decay of 0.1, a batch size of 128 (128 pairs, 256 rows), and a sequence length of 2048 tokens. We train for 2 epochs over the training dataset.

For ppo stage, we use a constant learning rate of 1e−6, a batch size of 128, a mini batch size of 32, a ppo-epoch of 1, and a sequence length of 2048 tokens. We train for one epoch over the training dataset.

We try to stay close to the training parameters in the original Llama 2 paper.

For the sample format, we format our data to a Human-Assistant template, with Human prefix from Human,USER,[HM] and Assistant prefix from Assistant,AI,[AI], and we add bos token and eos token for each turns.

Model Download

Below are the full models, which can be used directly.

Model Name Type Training Data Download Link
opendata-chinese-llama2-sft-13B sft Model 5M Instructions [HuggingFace]
opendata-chinese-llama2-reward-13B reward Model 160k rankingPairs [HuggingFace]
opendata-chinese-llama2-chat-13B chat Model 50k prompts [HuggingFace]

Usage

sft/chat model usage

from transformers import AutoConfig, AutoModelForCausalLM, LlamaTokenizer

def load_model(model_path):
    model_config = AutoConfig.from_pretrained(model_path, trust_remote_code=True)
    model = AutoModelForCausalLM.from_pretrained(
            model_path,
            config=model_config,
            trust_remote_code=True)
    model = model.eval().cuda()
    tokenizer = LlamaTokenizer.from_pretrained(model_path, fast_tokenizer=True)
    return model, tokenizer
model, tokenizer = load_model("pandaExplosion/opendata-chinese-llama2-chat-13B")

prefix_human = "Human"
prefix_bot = "AI"
query = "给出10个暴富的建议"
text = f"<s>{prefix_human}\n{query}</s><s>{prefix_bot}\n"
input_ids = tokenizer(text, return_tensors="pt", add_special_tokens=False).input_ids
with torch.no_grad():
    outputs = model.generate(
            input_ids, 
            do_sample = False,
            max_new_tokens = 512,
            eos_token_id= tokenizer.eos_token_id,
            pad_token_id= tokenizer.pad_token_id,
            num_return_sequences = 1,
            return_dict_in_generate=True,
            repetition_penalty=1.1,
            top_p=0.95)
    out = outputs.sequences[:, input_ids.shape[-1]:]
    out_text = tokenizer.batch_decode(out, skip_special_tokens=True)
print(out_text)

reward model usage

from transformers import LlamaTokenizer
from reward_model import create_critic_model # modified from deepspeedChat

def load_reward_model(model_path):
    tokenizer = LlamaTokenizer.from_pretrained(model_path, fast_tokenizer=True)
    model = create_critic_model(model_path, tokenizer)
    model = model.eval().cuda()
    return model, tokenizer
model, tokenizer = load_reward_model("pandaExplosion/opendata-chinese-llama2-reward-13B")

prefix_human = "Human"
prefix_bot = "AI"
query = "给出10个暴富的建议"
response = "不给"
prompt = f"<s>{prefix_human}\n{query}</s>" + "\n" +  f"<s>{prefix_bot}\n" 
text = f"<s>{prefix_human}\n{query}</s>" + "\n" +  f"<s>{prefix_bot}\n{response}</s>"
prompt_len = tokenizer(prompt, return_tensors="pt", add_special_tokens=False).input_ids.shape[-1]
inputs = tokenizer(text, return_tensors="pt", add_special_tokens=False)
with torch.no_grad():
    reward = model.forward_value(
        inputs['input_ids'].cuda(), 
        attention_mask = inputs['attention_mask'].cuda(),
        prompt_length=prompt_len)
    print(reward['chosen_end_scores']) # [-2.5229]

the reward_model.py is in [reward model dir]

Evaluation

Generation Performance

to be continue

NLU Performance Evaluation

We report the C-Eval dataset reuslt of our models.

Model Valid (5-shot) TestAvg (5-shot) TestAvgHard (5-shot)
LLaMA-2-13B* 37.3 36.6 31.7
LLaMA-2-chat-13B* 38.6 37.2 30.0
opendata-chinese-llama2-sft-13B 39.5 40.2 30.1
opendata-chinese-llama2-chat-13B 40.3 40.5 29.8

LLaMA-2 were tested in our own enviroment

Acknowledgments

We would like to express our gratitude to the related projects.

Downloads last month
11
Inference Examples
This model does not have enough activity to be deployed to Inference API (serverless) yet. Increase its social visibility and check back later, or deploy to Inference Endpoints (dedicated) instead.