Trainer/SFT Trainer. Showing learning curves

#1
by Karolos - opened

Hi thanks for the repo. Did you use Trainer or SFT Trainer class for this? Also would you mind sharing tensorboard learning curves? I am trying to recreate your model but using Qlora 4 bit but the model seems to overfit early. Another question is that. Have you encounter problems during inference?

I have a problem that model keeps generating another responses even after providing valid answer in the first sentence. Here is an example

Ceremonia otwarcia Letnich Igrzysk Olimpijskich 2024 w Paryżu była kontrowersyjna ze względu na odtworzenie obrazu Leonarda da Vinci Ostatnia Wieczerza przez drag queens. \n\nCeremonia otwarcia Letnich Igrzysk Olimpijskich 2024 w Paryżu była kontrowersyjna ze względu na odtworzenie obrazu Leonarda da Vinci Ostatnia Wieczerza przez drag queens. \n\nCeremonia otwarcia Letnich Igrzysk Olimpijskich 2024 w Paryżu była kontrowersyjna ze względu na odtworzenie obrazu Leonarda da Vinci Ostatnia Wieczerza przez drag queens. \n\nCeremonia otwarcia Letnich Igrzysk Olimpijskich 2024 w Paryżu była kontrowersyjna ze względu na odtworzenie obrazu Leonarda da Vinci Ostatnia Wieczerza przez drag queens. \n\nCeremonia ot
Karolos changed discussion title from Trainer/SFT Trainer to Trainer/SFT Trainer. Showing learning curves

Hello.
I used SFTTrainer with these arguments:

max_seq_length = 4096

trainer = SFTTrainer(
    model=model,
    args=args,
    train_dataset=dataset,
    max_seq_length=max_seq_length,
    packing=True,
    dataset_kwargs={
        "add_special_tokens": False,
        "append_concat_token": False
    }
)

Also I am attaching some curves from tensorboard
train_grad_norm.jpg
train_loss.jpg

I am not sure about repetition problems. Are you using 4-bit for inference?

Thanks a lot for learning curves and parameters! I don't use 4 bit for inference but I am training this SFTTrainer with unsloth to speed up training, maybe that's the case. Did you do something with the , or other tokens during finetunning, or you just left them as they are in original Qra-7b model tokenizer?

Based on this article: https://www.philschmid.de/fine-tune-llms-in-2024-with-trl

this is my script for dataset preparation:

from datasets import load_dataset

system_message = """Jesteś przyjaznym chatbotem"""

def create_conversation(sample) -> dict:
    strip_characters = "\"'"
    return {
        "messages": [
            {"role": "system", "content": system_message},
            {"role": "user",
             "content": f"{sample['instruction'].strip(strip_characters)} "
                        f"{sample['input'].strip(strip_characters)}"},
            {"role": "assistant",
             "content": f"{sample['output'].strip(strip_characters)}"}
        ]
    }

dataset = load_dataset("s3nh/alpaca-dolly-instruction-only-polish", split="train")
dataset = dataset.shuffle(seed=42)
dataset = dataset.map(create_conversation,
                      remove_columns=dataset.features, batched=False)
dataset = dataset.train_test_split(0.1)

dataset["train"].to_json("train_dataset.json", orient="records")
dataset["test"].to_json("test_dataset.json", orient="records")

and the tokenizer in training script:

model_id = "Qra-7b"
tokenizer = AutoTokenizer.from_pretrained(model_id)
tokenizer.padding_side = "right"
tokenizer.add_special_tokens({"pad_token": "[PAD]"})

Sign up or log in to comment