Model Card for Model ID (to be completed)

This model is developed as the completion requirement of the Matsuo Lab LLM2024 course.

Model Description

This is the model card of a 🤗 transformers model that has been pushed on the Hub. This model card has been automatically generated.

License: [More Information Needed]
Finetuned from model [optional]: llm-jp-3-13b

Model Sources [optional]

-->

Training Details

Training Data

Base Model: llm-jp/llm-jp-3-13b
Data for Instructoin Tuning: ichikara-
Data for DPO: https://huggingface.co/datasets/elyza/ELYZA-tasks-100

Training Procedure

Fine-tune the base model with Instruction Tuning
Perform DPO on the fine-tuned model with generated data
- 3 similar prompts are generated for each sample prompt in the DPO data
- The fine-tuned model is used to generate two answers for each of the prompt
- Due to time limitation, first generated answer is to be labelled as the chosen answer

Training Hyperparameters

SFT for instruction tuning

max_seq_length = 512 
dtype = None 
load_in_4bit = True

model_id = "llm-jp/llm-jp-3-13b"
new_model_id = "llm-jp-3-13b-it"
model, tokenizer = FastLanguageModel.from_pretrained(
    model_name=model_id,
    dtype=dtype,
    load_in_4bit=load_in_4bit,
    trust_remote_code=True,
    device_map="auto",
)

model = FastLanguageModel.get_peft_model(
    model,
    r = 16, #32
    target_modules = ["q_proj", "k_proj", "v_proj", "o_proj",
                      "gate_proj", "up_proj", "down_proj",],
    lora_alpha = 32,
    lora_dropout = 0, #0.05
    bias = "none",
    use_gradient_checkpointing = "unsloth",
    random_state = 3407,
    use_rslora = False,
    loftq_config = None,
    max_seq_length = max_seq_length,
)

Training Hyperparameters

from trl import SFTTrainer
from transformers import TrainingArguments
from unsloth import is_bfloat16_supported

trainer = SFTTrainer(
    model = model,
    tokenizer = tokenizer,
    train_dataset=dataset["train"],
    max_seq_length = max_seq_length,
    dataset_text_field="formatted_text",
    packing = False,
    args = TrainingArguments(
        per_device_train_batch_size = 2,
        gradient_accumulation_steps = 4,
        num_train_epochs = 1,
        logging_steps = 10,
        warmup_steps = 5, #10
        save_steps=100,
        save_total_limit=2,
        max_steps= -1,
        learning_rate = 2e-4,
        fp16 = not is_bfloat16_supported(),
        bf16 = is_bfloat16_supported(),
        group_by_length=True,
        seed = 3407,
        output_dir = "outputs",
        report_to = "none",
        # additional settings
        optim = "adamw_8bit", 
        weight_decay = 0.01
    ),
)

Experimental Trials

Instruction Tuning Only (model x data)
(hyperparameter settings as commented)
01 - llm-jp-3-13b x ichikara-instruction-003-001-1.json (unmodified sample code provided)
02 - llm-jp-3-13b x ichikara-instruction-003-002-1.json
03 - Llama-3.1-8B-Instruct-bnb-4bit x ichikara-instruction-003-001-1.json
04 - Llama-3.2-8B-Instruct-bnb-4bit x ichikara-instruction-003-001-1.json
05 - gemma-2-9b-bnb-4bit x ichikara-instruction-003-001-1.json
09 - llm-jp-3-13b x kunishou/databricks-dolly-15k-ja

(hyperparameter settings as non-commented)
00 - llm-jp-3-13b x ichikara-instruction-003-001-1.json
06 - gemma-2-9b-bnb-4bit x ichikara-instruction-003-001-1.json
07 - llm-jp-3-13b x ichikara-instruction-003-001-1.json
08 - llm-jp-3-13b x ichikara-instruction-003-001-1.json (with max_steps = 150)
10 - gemma-2-9b-bnb-4bit x kunishou/databricks-dolly-15k-ja

Instruction Tuning + DPO
11 - 00 + DPO
12 - 06 + DPO

[More Information Needed]

Evaluation

Testing Data

The final performance of the model is to be evaluated using the elyza-tasks-100-TV dataset

Metrics

The score below is given upon uploading the outputs to the course management system.

Results

Trial	Score
00	3.04
01	3.00
02	2.71
03	2.52
04	2.40
05	2.71
06	2.72
07	2.93
08	2.87
09	2.20
10	2.40
11	2.34
12	2.28

Summary

This model is the result of the 11th attempt of the competition, with the score of 2.34 from the course evaluation system.

Model Architecture and Objective

[More Information Needed]

Compute Infrastructure

The model is trained using T4/L4/A100 GPUs on Google Colabotory

uthal
/

llm-jp-3-13b-it-dpo

Model Card for Model ID (to be completed)

Model Description

Model Sources [optional]

Training Details

Training Data

Training Procedure

Training Hyperparameters

Experimental Trials

Evaluation

Testing Data

Metrics

Results

Summary

Model Architecture and Objective

Compute Infrastructure

Model tree for uthal/llm-jp-3-13b-it-dpo