flan-t5-base-openbsd-faq

This model is a fine-tuned version of google/flan-t5-base fintuned on ajsbsd/openbsd-faq

These are questions from https://www.openbsd.org/faq/faq1.html for use on ajsbsd.net

It achieves the following results on the evaluation set:

  • Loss: 2.2385
  • Rouge1: 0.3935
  • Rouge2: 0.3383
  • Rougel: 0.3906
  • Rougelsum: 0.3844

Model description

This model is a fine-tuned version of google/flan-t5-base

Intended uses & limitations

OpenBSD Q/A chat-bot.

Training and evaluation data

Questions created from https://www.openbsd.org/faq/faq1.html in Q/A format for text2text generation.

Training procedure

Trained at Google Colab with the following code.

!pip install -q transformers[torch] tokenizers datasets evaluate rouge_score sentencepiece huggingface_hub --upgrade

from huggingface_hub import notebook_login
notebook_login()

import nltk
from datasets import load_dataset
import evaluate
import numpy as np
from transformers import T5Tokenizer, DataCollatorForSeq2Seq
from transformers import T5ForConditionalGeneration, Seq2SeqTrainingArguments, Seq2SeqTrainer

# Load and split the dataset
dataset = load_dataset("ajsbsd/openbsd-faq")
dataset = dataset["train"].train_test_split(test_size=0.2)
#dataset = load_dataset("csv", data_files="./JEOPARDY_CSV.csv")
#dataset = dataset["train"].train_test_split(test_size=0.2)
# Load the tokenizer, model, and data collator
tokenizer = T5Tokenizer.from_pretrained("google/flan-t5-base")
model = T5ForConditionalGeneration.from_pretrained("google/flan-t5-base")
data_collator = DataCollatorForSeq2Seq(tokenizer=tokenizer, model=model)

# We prefix our tasks with "answer the question"
prefix = "Please answer this question: "

# Define our preprocessing function
def preprocess_function(examples):
    """Add prefix to the sentences, tokenize the text, and set the labels"""
    # The "inputs" are the tokenized answer:
    inputs = [prefix + doc for doc in examples["question"]]
    model_inputs = tokenizer(inputs, max_length=128, truncation=True)

    # The "labels" are the tokenized outputs:
    labels = tokenizer(text_target=examples["answer"], max_length=512, truncation=True)
    model_inputs["labels"] = labels["input_ids"]
    return model_inputs

# Map the preprocessing function across our dataset
tokenized_dataset = dataset.map(preprocess_function, batched=True)

# Set up Rouge score for evaluation
nltk.download("punkt", quiet=True)
metric = evaluate.load("rouge")

def compute_metrics(eval_preds):
    preds, labels = eval_preds

    # decode preds and labels
    labels = np.where(labels != -100, labels, tokenizer.pad_token_id)
    decoded_preds = tokenizer.batch_decode(preds, skip_special_tokens=True)
    decoded_labels = tokenizer.batch_decode(labels, skip_special_tokens=True)

    # rougeLSum expects newline after each sentence
    decoded_preds = ["\n".join(nltk.sent_tokenize(pred.strip())) for pred in decoded_preds]
    decoded_labels = ["\n".join(nltk.sent_tokenize(label.strip())) for label in decoded_labels]

    result = metric.compute(predictions=decoded_preds, references=decoded_labels, use_stemmer=True)
    return result

# Set up training arguments
training_args = Seq2SeqTrainingArguments(
    output_dir="./flan-t5-base-openbsd-faq",
    evaluation_strategy="epoch",
    learning_rate=3e-4,
    per_device_train_batch_size=8,
    per_device_eval_batch_size=4,
    weight_decay=0.01,
    save_total_limit=3,
    num_train_epochs=5,
    predict_with_generate=True,
    push_to_hub=False
)

# Set up trainer
trainer = Seq2SeqTrainer(
    model=model,
    args=training_args,
    train_dataset=tokenized_dataset["train"],
    eval_dataset=tokenized_dataset["test"],
    tokenizer=tokenizer,
    data_collator=data_collator,
    compute_metrics=compute_metrics
)

# Train the model
trainer.train()

trainer.push_to_hub()

Training hyperparameters

The following hyperparameters were used during training:

  • learning_rate: 0.0003
  • train_batch_size: 8
  • eval_batch_size: 4
  • seed: 42
  • optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
  • lr_scheduler_type: linear
  • num_epochs: 5

Training results

Training Loss Epoch Step Validation Loss Rouge1 Rouge2 Rougel Rougelsum
No log 1.0 9 2.2184 0.3985 0.3308 0.3878 0.3902
No log 2.0 18 2.2060 0.4044 0.3231 0.3959 0.3937
No log 3.0 27 2.2271 0.4063 0.3315 0.4006 0.3971
No log 4.0 36 2.2251 0.4069 0.3366 0.4001 0.3937
No log 5.0 45 2.2385 0.3935 0.3383 0.3906 0.3844

Framework versions

  • Transformers 4.35.2
  • Pytorch 2.1.0+cu118
  • Datasets 2.14.7
  • Tokenizers 0.15.0
Downloads last month
32
Safetensors
Model size
248M params
Tensor type
F32
·
Inference Examples
This model does not have enough activity to be deployed to Inference API (serverless) yet. Increase its social visibility and check back later, or deploy to Inference Endpoints (dedicated) instead.

Model tree for ajsbsd/flan-t5-base-openbsd-faq

Finetuned
(659)
this model