metadata

license: apache-2.0
language:
  - zh
  - en
library_name: transformers
tags:
  - qihoo360
  - 奇虎360
  - zhinao
  - 360Zhinao
  - pretrain

中文｜ English

360智脑

🤗 Hugging Face | 🤖 ModelScope ｜ 💬 WeChat (微信)

欢迎访问360智脑官网 https://ai.360.com 体验更多更强大的功能。

模型介绍

🎉🎉🎉我们开源了360智脑大模型的系列工作，本次开源了以下模型：

360Zhinao2-7B-Base
360Zhinao2-7B-Chat-4K
360Zhinao2-7B-Chat-32K
360Zhinao2-7B-Chat-360K

360智脑大模型特点如下：

基础模型：采⽤当前主流的两阶段训练⽅法，第⼀阶段采用cosine学习率总共训练10T token，第二阶段我们加⼤了⾼质量数据的占⽐，训练了100B⾼质量token，学习率LR直接decay到0。360Zhinao2-7B总共训练数据量达10.1T token。
对话模型：具有强大的对话能力，开放4K、32K、360K三种不同文本长度。

更新信息

[2024.11.18] 🔥🔥🔥我们发布了360Zhinao2-7B，同时开放Base模型和4K、32K、360K三种文本长度的Chat模型。
[2024.05.23] 我们发布了360Zhinao-search以及360Zhinao-1.8B-Reranking两个模型，分别在C-MTEB 榜单的Retrieval和Reranking任务上排名第一。
[2024.05.20] 我们将llama3的窗口长度扩展到360k并发布了llama3-8B-360Zhinao-360k-Instruct🤗
[2024.04.12] 我们发布了360Zhinao-7B 1.0版本，同时开放Base模型和4K、32K、360K三种文本长度的Chat模型。技术报告详见arXiv。

下载地址

本次发布版本和下载链接见下表：

Size	Model	BF16	Int4
7B	360Zhinao2-7B-Base	🤖 🤗
7B	360Zhinao2-7B-Chat-4K	🤖 🤗	🤖 🤗
7B	360Zhinao2-7B-Chat-32K	🤖 🤗	🤖 🤗
7B	360Zhinao2-7B-Chat-360K	🤖 🤗	🤖 🤗

模型评估

我们使⽤了开源⼯具opencompass对模型进⾏评估，对⽐了近半年国内外开源的10B以下模型， 360Zhinao2-7B具备较强的竞争⼒。360Zhinao2-7B在CEval（中⽂考试）、C3（中⽂阅读理解）、lcsts（中⽂短⽂本摘要）等中⽂benchmark上表现不俗，中⽂ benchmark均分排名第⼀。在挑战性的竞赛数学数据集math上，同样排名第⼀。360Zhinao2-7B模型在中⽂处理能⼒、复杂数学推理能⼒两个⽅⾯，具备优势。

Type	Datasets	language	glm4-9b	Qwen2.5-7B	internlm2.5-7b	Yi1.5-9B	gemma2-9b	Llama3.1-8B	360Zhinao2-7B
Exam	ceval	zh	75.83	81.41	77.71	73.51	56.36	51.67	83.04
	mmlu	en	75.5	75.5	71.55	71.43	72.22	66.75	67.84
	cmmlu	zh	74.24	81.79	78.77	74.2	58.89	52.49	73.8
	ARC-c	en	94.92	80	85.08	87.46	77.63	80.68	87.12
	ARC-e	en	98.41	84.83	95.24	94.53	78.84	89.77	92.77
Language	WiC	en	51.57	52.82	50.78	50.63	50.47	50	49.84
Language	WSC	en	68.27	68.27	69.23	66.35	68.27	67.31	65.38
Knowledge	BoolQ	en	81.8	83.88	89.51	84.46	85.6	82.2	88.29
Knowledge	commonsense_qa	en	71.17	73.22	68.55	71.58	68.47	71.25	69.78
Understanding	C3	zh	91.51	92	93.04	85.86	81.64	83.51	93.26
	race-middle	en	91.99	91.02	92.06	91.16	88.09	81.69	90.46
	race-high	en	90.71	87.91	90.08	88.34	82.08	78.73	86.74
	lcsts	zh	18.29	15.82	15.96	16.49	10.62	17.29	18.61
	eprstmt-dev	zh	91.88	86.88	91.25	91.88	48.12	83.12	90
	lambada	en	71.67	71.14	69.98	70.64	75.43	74.23	72.56
Reasoning	hellaswag	en	70.25	72.76	70.38	71.55	66.83	74.65	71.49
	siqa	en	81.73	72.52	78.97	76.2	58.96	64.18	77.12
	bbh	en	73.68	54.63	59.43	67.86	68.45	59.9	46.54
Code	humaneval	en	69.51	75	60.37	26.22	5.49	27.44	60.98
Code	mbpp	en	60	60	43.6	56.8	51.2	42.6	54
Math	math	en	26.86	38	27.14	27.06	28.52	15.32	38.34
Math	gsm8k	en	78.54	79.76	52.54	71.11	73.09	56.25	75.51
Overall	avg_zh		70.35	71.58	71.35	68.39	51.13	57.62	71.74
Overall	avg_all		73.11	71.78	69.60	68.88	61.60	62.32	70.61

基础模型

快速开始

简单的示例来说明如何利用🤖 ModelScope和🤗 Transformers快速使用360Zhinao2-7B-Base和360Zhinao2-7B-Chat

依赖安装

python 3.8 and above
pytorch 2.0 and above
transformers 4.37.2 and above
CUDA 11.4 and above are recommended.

pip install -r requirements.txt

我们推荐安装flash-attention（当前已支持flash attention 2）来提高你的运行效率以及降低显存占用。(flash-attention只是可选项，不安装也可正常运行该项目)

flash-attn >= 2.3.6

FLASH_ATTENTION_FORCE_BUILD=TRUE pip install flash-attn==2.3.6

🤗 Transformers

Base模型推理

此代码演示使用transformers快速使用360Zhinao2-7B-Base模型进行推理

from transformers import AutoTokenizer, AutoModelForCausalLM
from transformers.generation import GenerationConfig

MODEL_NAME_OR_PATH = "qihoo360/360Zhinao2-7B-Base"

tokenizer = AutoTokenizer.from_pretrained(
    MODEL_NAME_OR_PATH, 
    trust_remote_code=True)

model = AutoModelForCausalLM.from_pretrained(
    MODEL_NAME_OR_PATH,
    device_map="auto",
    trust_remote_code=True)

generation_config = GenerationConfig.from_pretrained(
    MODEL_NAME_OR_PATH,
    trust_remote_code=True)

inputs = tokenizer('中国二十四节气\n1. 立春\n2. 雨水\n3. 惊蛰\n4. 春分\n5. 清明\n', return_tensors='pt')
inputs = inputs.to(model.device)

pred = model.generate(input_ids=inputs["input_ids"], generation_config=generation_config)
print("outputs:\n", tokenizer.decode(pred.cpu()[0], skip_special_tokens=True))

Chat模型推理

此代码演示使用transformers快速使用360Zhinao2-7B-Chat-4K模型进行推理

from transformers import AutoTokenizer, AutoModelForCausalLM
from transformers.generation import GenerationConfig

MODEL_NAME_OR_PATH = "qihoo360/360Zhinao2-7B-Chat-4K"

tokenizer = AutoTokenizer.from_pretrained(
    MODEL_NAME_OR_PATH, 
    trust_remote_code=True)

model = AutoModelForCausalLM.from_pretrained(
    MODEL_NAME_OR_PATH,
    device_map="auto",
    trust_remote_code=True)

generation_config = GenerationConfig.from_pretrained(
    MODEL_NAME_OR_PATH,
    trust_remote_code=True)

messages = []
#round-1
messages.append({"role": "user", "content": "介绍一下刘德华"})
response = model.chat(tokenizer=tokenizer, messages=messages, generation_config=generation_config)
messages.append({"role": "assistant", "content": response})
print(messages)

#round-2
messages.append({"role": "user", "content": "他有什么代表作？"})
response = model.chat(tokenizer=tokenizer, messages=messages, generation_config=generation_config)
messages.append({"role": "assistant", "content": response})
print(messages)

🤖 ModelScope

Base模型推理

此代码演示使用ModelScope快速使用360Zhinao2-7B-Base模型进行推理

from modelscope import AutoModelForCausalLM, AutoTokenizer
from modelscope import GenerationConfig

MODEL_NAME_OR_PATH = "qihoo360/360Zhinao2-7B-Base"

tokenizer = AutoTokenizer.from_pretrained(
    MODEL_NAME_OR_PATH, 
    trust_remote_code=True)

model = AutoModelForCausalLM.from_pretrained(
    MODEL_NAME_OR_PATH,
    device_map="auto",
    trust_remote_code=True)

generation_config = GenerationConfig.from_pretrained(
    MODEL_NAME_OR_PATH,
    trust_remote_code=True)

inputs = tokenizer('中国二十四节气\n1. 立春\n2. 雨水\n3. 惊蛰\n4. 春分\n5. 清明\n', return_tensors='pt')
inputs = inputs.to(model.device)

pred = model.generate(input_ids=inputs["input_ids"], generation_config=generation_config)
print("outputs:\n", tokenizer.decode(pred.cpu()[0], skip_special_tokens=True))

Chat模型推理

此代码演示使用ModelScope快速使用360Zhinao2-7B-Chat-4K模型进行推理

from modelscope import AutoModelForCausalLM, AutoTokenizer
from modelscope import GenerationConfig

MODEL_NAME_OR_PATH = "qihoo360/360Zhinao2-7B-Chat-4K"

tokenizer = AutoTokenizer.from_pretrained(
    MODEL_NAME_OR_PATH, 
    trust_remote_code=True)

model = AutoModelForCausalLM.from_pretrained(
    MODEL_NAME_OR_PATH,
    device_map="auto",
    trust_remote_code=True)

generation_config = GenerationConfig.from_pretrained(
    MODEL_NAME_OR_PATH,
    trust_remote_code=True)

messages = []
#round-1
messages.append({"role": "user", "content": "介绍一下刘德华"})
response = model.chat(tokenizer=tokenizer, messages=messages, generation_config=generation_config)
messages.append({"role": "assistant", "content": response})
print(messages)

#round-2
messages.append({"role": "user", "content": "他有什么代表作？"})
response = model.chat(tokenizer=tokenizer, messages=messages, generation_config=generation_config)
messages.append({"role": "assistant", "content": response})
print(messages)

终端 Demo

可使用终端交互实现快速体验

python cli_demo.py

注：我们尚未支持Mac上device = 'mps'。

网页 Demo

也可使用网页交互实现快速体验

streamlit run web_demo.py

API Demo

启动命令

python openai_api.py

请求参数

curl 'http://localhost:8360/v1/chat/completions' \
-H 'Content-Type: application/json' \
-d '{
    "max_new_tokens": 200,
    "do_sample": true,
    "top_k": 0,
    "top_p": 0.8,
    "temperature": 1.0,
    "repetition_penalty": 1.0,
    "messages": [
        {"role": "system", "content": "You are a helpful assistant."},
        {"role": "user", "content": "你好"}
    ]
}'

模型推理

模型量化

我们提供了基于AutoGPTQ的量化方案，并开源了Int4量化模型。

模型部署

vLLM安装环境

如希望部署及加速推理，我们建议你使用 vLLM==0.3.3。

如果你使用CUDA 12.1和PyTorch 2.1，可以直接使用以下命令安装vLLM。

pip install vllm==0.3.3

否则请参考vLLM官方的安装说明。

安装完成后，还需要以下操作~

把vllm/zhinao.py文件复制到env环境对应的vllm/model_executor/models目录下。
把vllm/serving_chat.py文件复制到env环境对应的vllm/entrypoints/openai目录下。
然后在vllm/model_executor/models/__init__.py文件增加一行代码
```
"ZhinaoForCausalLM": ("zhinao", "ZhinaoForCausalLM"),
```

vLLM服务启动

启动服务

python -m vllm.entrypoints.openai.api_server \
    --served-model-name 360Zhinao2-7B-Chat-4K \
    --model qihoo360/360Zhinao2-7B-Chat-4K \
    --trust-remote-code \
    --tensor-parallel-size 1 \
    --max-model-len 4096 \
    --host 0.0.0.0 \
    --port 8360

使用curl请求服务

curl http://localhost:8360/v1/chat/completions \
-H "Content-Type: application/json" \
-d '{
    "model": "360Zhinao2-7B-Chat-4K",
    "max_tokens": 200,
    "top_k": -1,
    "top_p": 0.8,
    "temperature": 1.0,
    "presence_penalty": 0.0,
    "frequency_penalty": 0.0,
    "messages": [
        {"role": "system", "content": "You are a helpful assistant."},
        {"role": "user", "content": "你好"}
    ],
    "stop": [
        "<eod>",
        "<|im_end|>",
        "<|im_start|>"
    ]
}'

使用python请求服务

from openai import OpenAI
# Set OpenAI's API key and API base to use vLLM's API server.
openai_api_key = "EMPTY"
openai_api_base = "http://localhost:8360/v1"

client = OpenAI(
    api_key=openai_api_key,
    base_url=openai_api_base,
)

chat_response = client.chat.completions.create(
    model="360Zhinao2-7B-Chat-4K",
    messages=[
        {"role": "system", "content": "You are a helpful assistant."},
        {"role": "user", "content": "你好"},
    ],
    stop=[
        "<eod>",
        "<|im_end|>",
        "<|im_start|>"
    ],
    presence_penalty=0.0,
    frequency_penalty=0.0
)
print("Chat response:", chat_response)

注意：如需要开启重复惩罚，建议使用 presence_penalty 和 frequency_penalty 参数。

模型微调

训练数据

我们提供了微调训练样例数据 data/test.json，该样例数据是从 multiturn_chat_0.8M 采样出 1 万条，并且做了格式转换。

数据格式:

[
  {
    "id": 1,
    "conversations": [
        {
            "from": "system",
            "value": "You are a helpful assistant."
        },
        {
            "from": "user",
            "value": "您好啊"
        },
        {
            "from": "assistant",
            "value": "你好！我今天能为您做些什么？有什么问题或需要帮助吗? 我在这里为您提供服务。"
        }
    ]
  }
]

微调训练

训练脚本如下：

set -x

HOSTFILE=hostfile
DS_CONFIG=./finetune/ds_config_zero2.json

# PARAMS
LR=5e-6
EPOCHS=3
MAX_LEN=4096
BATCH_SIZE=4
NUM_NODES=1
NUM_GPUS=8
MASTER_PORT=29500

IS_CONCAT=False # 是否数据拼接到最大长度（MAX_LEN）

DATA_PATH="./data/training_data_sample.json"
MODEL_PATH="qihoo360/360Zhinao2-7B-Base"
OUTPUT_DIR="./outputs/"

deepspeed --hostfile ${HOSTFILE} \
        --master_port ${MASTER_PORT} \
        --num_nodes ${NUM_NODES} \
        --num_gpus ${NUM_GPUS} \
        finetune.py \
        --report_to "tensorboard" \
        --data_path ${DATA_PATH} \
        --model_name_or_path ${MODEL_PATH} \
        --output_dir ${OUTPUT_DIR} \
        --model_max_length ${MAX_LEN} \
        --num_train_epochs ${EPOCHS} \
        --per_device_train_batch_size ${BATCH_SIZE} \
        --gradient_accumulation_steps 1 \
        --save_strategy steps \
        --save_steps 200 \
        --learning_rate ${LR} \
        --lr_scheduler_type cosine \
        --adam_beta1 0.9 \
        --adam_beta2 0.95 \
        --adam_epsilon 1e-8 \
        --max_grad_norm 1.0 \
        --weight_decay 0.1 \
        --warmup_ratio 0.01 \
        --gradient_checkpointing True \
        --bf16 True \
        --tf32 True \
        --deepspeed ${DS_CONFIG} \
        --is_concat ${IS_CONCAT} \
        --logging_steps 1 \
        --log_on_each_node False

bash finetune/ds_finetune.sh

可通过配置hostfile，实现单机、多机训练。
可通过配置ds_config，实现zero2、zero3。
可通过配置fp16、bf16实现混合精度训练，建议使用bf16，与预训练模型保持一致。
可通过配置is_concat参数，控制训练数据是否拼接，当训练数据量级较大时，可通过拼接提升训练效率。

许可证

本仓库源码遵循开源许可证Apache 2.0。

360智脑开源模型支持免费商用，无需向我们进行特殊申请。

360智脑

模型介绍

更新信息

目录

下载地址

模型评估

基础模型

快速开始

依赖安装

🤗 Transformers

Base模型推理

Chat模型推理

🤖 ModelScope

Base模型推理

Chat模型推理

终端 Demo

网页 Demo

API Demo

模型推理

模型量化

模型部署

vLLM安装环境

vLLM服务启动

模型微调

训练数据

微调训练

许可证