Model Card for Model ID (to be completed)

This model is developed as the completion requirement of the Matsuo Lab LLM2024 course.

Model Description

This is the model card of a 🤗 transformers model that has been pushed on the Hub. This model card has been automatically generated.

  • License: [More Information Needed]
  • Finetuned from model [optional]: llm-jp-3-13b

Model Sources [optional]

-->

Training Details

Training Data

Training Procedure

  1. Fine-tune the base model with Instruction Tuning
  2. Perform DPO on the fine-tuned model with generated data
    • 3 similar prompts are generated for each sample prompt in the DPO data
    • The fine-tuned model is used to generate two answers for each of the prompt
    • Due to time limitation, first generated answer is to be labelled as the chosen answer

Training Hyperparameters

SFT for instruction tuning

max_seq_length = 512 
dtype = None 
load_in_4bit = True

model_id = "llm-jp/llm-jp-3-13b"
new_model_id = "llm-jp-3-13b-it"
model, tokenizer = FastLanguageModel.from_pretrained(
    model_name=model_id,
    dtype=dtype,
    load_in_4bit=load_in_4bit,
    trust_remote_code=True,
    device_map="auto",
)

model = FastLanguageModel.get_peft_model(
    model,
    r = 16, #32
    target_modules = ["q_proj", "k_proj", "v_proj", "o_proj",
                      "gate_proj", "up_proj", "down_proj",],
    lora_alpha = 32,
    lora_dropout = 0, #0.05
    bias = "none",
    use_gradient_checkpointing = "unsloth",
    random_state = 3407,
    use_rslora = False,
    loftq_config = None,
    max_seq_length = max_seq_length,
)

Training Hyperparameters

from trl import SFTTrainer
from transformers import TrainingArguments
from unsloth import is_bfloat16_supported

trainer = SFTTrainer(
    model = model,
    tokenizer = tokenizer,
    train_dataset=dataset["train"],
    max_seq_length = max_seq_length,
    dataset_text_field="formatted_text",
    packing = False,
    args = TrainingArguments(
        per_device_train_batch_size = 2,
        gradient_accumulation_steps = 4,
        num_train_epochs = 1,
        logging_steps = 10,
        warmup_steps = 5, #10
        save_steps=100,
        save_total_limit=2,
        max_steps= -1,
        learning_rate = 2e-4,
        fp16 = not is_bfloat16_supported(),
        bf16 = is_bfloat16_supported(),
        group_by_length=True,
        seed = 3407,
        output_dir = "outputs",
        report_to = "none",
        # additional settings
        optim = "adamw_8bit", 
        weight_decay = 0.01
    ),
)

Experimental Trials

Instruction Tuning Only (model x data)
(hyperparameter settings as commented)
01 - llm-jp-3-13b x ichikara-instruction-003-001-1.json (unmodified sample code provided)
02 - llm-jp-3-13b x ichikara-instruction-003-002-1.json
03 - Llama-3.1-8B-Instruct-bnb-4bit x ichikara-instruction-003-001-1.json
04 - Llama-3.2-8B-Instruct-bnb-4bit x ichikara-instruction-003-001-1.json
05 - gemma-2-9b-bnb-4bit x ichikara-instruction-003-001-1.json
09 - llm-jp-3-13b x kunishou/databricks-dolly-15k-ja

(hyperparameter settings as non-commented)
00 - llm-jp-3-13b x ichikara-instruction-003-001-1.json
06 - gemma-2-9b-bnb-4bit x ichikara-instruction-003-001-1.json
07 - llm-jp-3-13b x ichikara-instruction-003-001-1.json
08 - llm-jp-3-13b x ichikara-instruction-003-001-1.json (with max_steps = 150)
10 - gemma-2-9b-bnb-4bit x kunishou/databricks-dolly-15k-ja

Instruction Tuning + DPO
11 - 00 + DPO
12 - 06 + DPO

[More Information Needed]

Evaluation

Testing Data

The final performance of the model is to be evaluated using the elyza-tasks-100-TV dataset

Metrics

The score below is given upon uploading the outputs to the course management system.

Results

Trial Score
00 3.04
01 3.00
02 2.71
03 2.52
04 2.40
05 2.71
06 2.72
07 2.93
08 2.87
09 2.20
10 2.40
11 2.34
12 2.28

Summary

This model is the result of the 11th attempt of the competition, with the score of 2.34 from the course evaluation system.

Model Architecture and Objective

[More Information Needed]

Compute Infrastructure

The model is trained using T4/L4/A100 GPUs on Google Colabotory

Downloads last month

-

Downloads are not tracked for this model. How to track
Inference API
Unable to determine this model’s pipeline type. Check the docs .

Model tree for uthal/llm-jp-3-13b-it-dpo

Finetuned
(1140)
this model