LLaMA Chinese 81M

一個小型中英文(雙語)預訓練語言模型。

Training Dataset

  • 中文維基百科(20230601)
  • 英文維基百科(20230601)

Tokenizer

使用重新在中英文語料上訓練的 BPE Tokenizer,擁有較佳的分詞效果與邊解碼效率。

https://github.com/p208p2002/BPE-tokenizer-from-zh-wiki

Downloads last month
468
Safetensors
Model size
81M params
Tensor type
F32
·
Inference Examples
This model does not have enough activity to be deployed to Inference API (serverless) yet. Increase its social visibility and check back later, or deploy to Inference Endpoints (dedicated) instead.

Dataset used to train p208p2002/llama-chinese-81M

Collection including p208p2002/llama-chinese-81M