Uploaded model

  • Developed by: Naotaka
  • License: apache-2.0
  • Finetuned from model : google/gemma-2-9b

This gemma2 model was trained 2x faster with Unsloth and Huggingface's TRL library.

Usage

Execute following code in Google Colab

!pip install -U pip --quiet

######### to avoid using unsloth model
!pip uninstall unsloth -y --quiet
!pip install -q --upgrade --no-cache-dir "unsloth[colab-new] @ git+https://github.com/niryuu/unsloth.git@use-exact-model-name"
######### /to avoid using unsloth model

!pip install --upgrade torch --quiet
!pip install --upgrade xformers --quiet
!pip install -U peft --quiet
!pip install -U openai --quiet
!pip install -U transformers --quiet
!pip install -U bitsandbytes --quiet
!pip install -U accelerate --quiet
!pip install -U datasets --quiet
!pip install -U peft --quiet
!pip install -U trl --quiet

# Install Flash Attention 2 for softcapping support
import torch
if torch.cuda.get_device_capability()[0] >= 8:
    !pip install --no-deps packaging ninja einops "flash-attn>=2.6.3" --quiet

# Hugging Face Token
from google.colab import userdata

HF_TOKEN = userdata.get('HF_TOKEN')

import torch
from unsloth import FastLanguageModel

dtype = None # Noneにしておけば自動で設定
load_in_4bit = True # 今回は13Bモデルを扱うためTrue

# model_size = '27b'
# USE_OZAKI_DATA = True
# DATA_SAMPLING_RATE = 0.05


lr = 2e-4
per_device_train_batch_size = 8
lora_r = 16
lora_alpha = 32
max_seq_length = 768 # unslothではRoPEをサポートしているのでコンテキスト長は自由に設定可能
max_seq_length_output = 512

####################################### Gemma2
USE_OZAKI_DATA = True
DATA_SAMPLING_RATE = 0.2
model_size = '9b'
model_id = f"google/gemma-2-{model_size}"
new_model_id = f"gemma-2-{model_size}-r{lora_r}-{max_seq_length}-{max_seq_length_output}-it"
lora_model_id = new_model_id+"_lora"
adapter_id = f"Naotaka/{lora_model_id}"

# FastLanguageModel インスタンスを作成
model, tokenizer = FastLanguageModel.from_pretrained(
    model_name=model_id,
    dtype=dtype,
    load_in_4bit=load_in_4bit,
    trust_remote_code=True,
)

# LoRAアダプタを適用
# lora_model_idからLoRAアダプタを読み込みモデルにマージ
from peft import PeftModel

model = PeftModel.from_pretrained(
    model,
    adapter_id,
    token=HF_TOKEN
)

# ELYZA-tasks-100-TVの読み込み。事前にファイルをアップロードしてください
# データセットの読み込み。
# omnicampusの開発環境では、左にタスクのjsonlをドラッグアンドドロップしてから実行。
import json
datasets = []
with open("./elyza-tasks-100-TV_0.jsonl", "r") as f:
    item = ""
    for line in f:
        line = line.strip()
        item += line
        if item.endswith("}"):
            datasets.append(json.loads(item))
            item = ""

# 学習したモデルを用いてタスクを実行
from tqdm import tqdm

# 推論するためにモデルのモードを変更
FastLanguageModel.for_inference(model)

results = []
for dt in tqdm(datasets):
    input = dt["input"]

    prompt = f"""### 指示\n{input}\n### 回答\n"""

    inputs = tokenizer([prompt], return_tensors = "pt").to(model.device)

    outputs = model.generate(**inputs, max_new_tokens = 368, use_cache = True, do_sample=False, repetition_penalty=1.2)
    prediction = tokenizer.decode(outputs[0], skip_special_tokens=True).split('\n### 回答')[-1]

    results.append({"task_id": dt["task_id"], "input": input, "output": prediction})

# jsonlで保存
file_name = f"./output.jsonl"
with open(file_name, 'w', encoding='utf-8') as f:
    for result in results:
        json.dump(result, f, ensure_ascii=False)
        f.write('\n')
Downloads last month

-

Downloads are not tracked for this model. How to track
Inference Providers NEW
This model is not currently available via any of the supported Inference Providers.
The model cannot be deployed to the HF Inference API: The model has no pipeline_tag.

Model tree for Naotaka/gemma-2-9b-r16-768-512-it_lora

Base model

google/gemma-2-9b
Finetuned
(228)
this model