Tanuki-8B-Instruct

Model Details

Model type: Llama-3-8B-like pretrained Language Model
Total seen tokens: 280B

Params	Layers	Hidden size	Intermediate size	Attention Heads	KV Heads	Context length	Rope Theta
8b	32	4096	14336	32	8	8192	500000

Usage

import torch
from transformers import AutoTokenizer, AutoModelForCausalLM
tokenizer = AutoTokenizer.from_pretrained("hatakeyama-llm-team/Tanuki-8B-Instruct")
model = AutoModelForCausalLM.from_pretrained("hatakeyama-llm-team/Tanuki-8B-Instruct", torch_dtype=torch.bfloat16).to('cuda')
chat = [
    {"role": "system", "content": "以下は、タスクを説明する指示と、文脈のある入力の組み合わせです。要求を適切に満たす応答を書きなさい。"},
    {"role": "user", "content": "たぬきってなんですか？"},
]
tokenized_input = tokenizer.apply_chat_template(chat, add_generation_prompt=True, tokenize=True, return_tensors="pt").to(model.device)
with torch.no_grad():
    output = model.generate(
        tokenized_input,
        max_new_tokens=256,
        do_sample=True,
        temperature=0.7,
        repetition_penalty=1.05,
    )[0]
print(tokenizer.decode(output))

※生成時にtokenizer.apply_chat_templateではなくtokenizer.encode()を用いる場合は、文末にEOSトークンが挿入されないようadd_special_tokens=Falseを設定してください。
例: tokenizer.encode(input_text, add_special_tokens=False, return_tensors="pt")
tokenizer.apply_chat_templateの場合はadd_special_tokens=Falseがデフォルトのため問題ありません。

Model Variant
Instruction models
hatakeyama-llm-team/Tanuki-8B-Instruct
hatakeyama-llm-team/Tanuki-8B-Instruct-without-DPO
Pre-trained models
Tanuki-8B
Tanuki-8B-Before-Context-Length-Extension