flan-t5-base-openbsd-faq
This model is a fine-tuned version of google/flan-t5-base fintuned on ajsbsd/openbsd-faq
These are questions from https://www.openbsd.org/faq/faq1.html for use on ajsbsd.net
It achieves the following results on the evaluation set:
- Loss: 2.2385
- Rouge1: 0.3935
- Rouge2: 0.3383
- Rougel: 0.3906
- Rougelsum: 0.3844
Model description
This model is a fine-tuned version of google/flan-t5-base
Intended uses & limitations
OpenBSD Q/A chat-bot.
Training and evaluation data
Questions created from https://www.openbsd.org/faq/faq1.html in Q/A format for text2text generation.
Training procedure
Trained at Google Colab with the following code.
!pip install -q transformers[torch] tokenizers datasets evaluate rouge_score sentencepiece huggingface_hub --upgrade
from huggingface_hub import notebook_login
notebook_login()
import nltk
from datasets import load_dataset
import evaluate
import numpy as np
from transformers import T5Tokenizer, DataCollatorForSeq2Seq
from transformers import T5ForConditionalGeneration, Seq2SeqTrainingArguments, Seq2SeqTrainer
# Load and split the dataset
dataset = load_dataset("ajsbsd/openbsd-faq")
dataset = dataset["train"].train_test_split(test_size=0.2)
#dataset = load_dataset("csv", data_files="./JEOPARDY_CSV.csv")
#dataset = dataset["train"].train_test_split(test_size=0.2)
# Load the tokenizer, model, and data collator
tokenizer = T5Tokenizer.from_pretrained("google/flan-t5-base")
model = T5ForConditionalGeneration.from_pretrained("google/flan-t5-base")
data_collator = DataCollatorForSeq2Seq(tokenizer=tokenizer, model=model)
# We prefix our tasks with "answer the question"
prefix = "Please answer this question: "
# Define our preprocessing function
def preprocess_function(examples):
"""Add prefix to the sentences, tokenize the text, and set the labels"""
# The "inputs" are the tokenized answer:
inputs = [prefix + doc for doc in examples["question"]]
model_inputs = tokenizer(inputs, max_length=128, truncation=True)
# The "labels" are the tokenized outputs:
labels = tokenizer(text_target=examples["answer"], max_length=512, truncation=True)
model_inputs["labels"] = labels["input_ids"]
return model_inputs
# Map the preprocessing function across our dataset
tokenized_dataset = dataset.map(preprocess_function, batched=True)
# Set up Rouge score for evaluation
nltk.download("punkt", quiet=True)
metric = evaluate.load("rouge")
def compute_metrics(eval_preds):
preds, labels = eval_preds
# decode preds and labels
labels = np.where(labels != -100, labels, tokenizer.pad_token_id)
decoded_preds = tokenizer.batch_decode(preds, skip_special_tokens=True)
decoded_labels = tokenizer.batch_decode(labels, skip_special_tokens=True)
# rougeLSum expects newline after each sentence
decoded_preds = ["\n".join(nltk.sent_tokenize(pred.strip())) for pred in decoded_preds]
decoded_labels = ["\n".join(nltk.sent_tokenize(label.strip())) for label in decoded_labels]
result = metric.compute(predictions=decoded_preds, references=decoded_labels, use_stemmer=True)
return result
# Set up training arguments
training_args = Seq2SeqTrainingArguments(
output_dir="./flan-t5-base-openbsd-faq",
evaluation_strategy="epoch",
learning_rate=3e-4,
per_device_train_batch_size=8,
per_device_eval_batch_size=4,
weight_decay=0.01,
save_total_limit=3,
num_train_epochs=5,
predict_with_generate=True,
push_to_hub=False
)
# Set up trainer
trainer = Seq2SeqTrainer(
model=model,
args=training_args,
train_dataset=tokenized_dataset["train"],
eval_dataset=tokenized_dataset["test"],
tokenizer=tokenizer,
data_collator=data_collator,
compute_metrics=compute_metrics
)
# Train the model
trainer.train()
trainer.push_to_hub()
Training hyperparameters
The following hyperparameters were used during training:
- learning_rate: 0.0003
- train_batch_size: 8
- eval_batch_size: 4
- seed: 42
- optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
- lr_scheduler_type: linear
- num_epochs: 5
Training results
Training Loss | Epoch | Step | Validation Loss | Rouge1 | Rouge2 | Rougel | Rougelsum |
---|---|---|---|---|---|---|---|
No log | 1.0 | 9 | 2.2184 | 0.3985 | 0.3308 | 0.3878 | 0.3902 |
No log | 2.0 | 18 | 2.2060 | 0.4044 | 0.3231 | 0.3959 | 0.3937 |
No log | 3.0 | 27 | 2.2271 | 0.4063 | 0.3315 | 0.4006 | 0.3971 |
No log | 4.0 | 36 | 2.2251 | 0.4069 | 0.3366 | 0.4001 | 0.3937 |
No log | 5.0 | 45 | 2.2385 | 0.3935 | 0.3383 | 0.3906 | 0.3844 |
Framework versions
- Transformers 4.35.2
- Pytorch 2.1.0+cu118
- Datasets 2.14.7
- Tokenizers 0.15.0
- Downloads last month
- 32
This model does not have enough activity to be deployed to Inference API (serverless) yet. Increase its social
visibility and check back later, or deploy to Inference Endpoints (dedicated)
instead.
Model tree for ajsbsd/flan-t5-base-openbsd-faq
Base model
google/flan-t5-base