JsteReubsSoftware's picture
Updated metadata
888b7c5 verified
metadata
base_model: t5-small
library_name: transformers
license: apache-2.0
tags:
  - generated_from_trainer
model-index:
  - name: en-af-sql-training-1727527893
    results: []
datasets:
  - b-mc2/sql-create-context
  - Clinton/Text-to-sql-v1
  - knowrohit07/know_sql
language:
  - af
  - en
pipeline_tag: text2text-generation
metrics:
  - Exact Match
  - TSED (Tree Similarity of Editing Distance)
  - SQAM (SQL Query Analysis Metric)
  - BLEU score

en-af-sql-training-1727527893

This model is a fine-tuned version of t5-small on three datasets: b-mc2/sql-create-context, Clinton/Text-to-sql-v1, knowrohit07/know-sql. It achieves the following results on the evaluation set:

  • Loss: 0.0210

Model description

This is a fine-tuned Afrikaans-to-SQL model. The pretrained t5-small was used to train our SQL model.

Training and Evaluation Datasets

As mentioned, to train the model we used a combination of three dataset which we split into training, testing, and validation sets. THe dataset can be found by following these links:

We did a 80-10-10 split on each dataset and then combined them into a single DatasetDict object with train, test, and validation sets.

DatasetDict({
    train: Dataset({
        features: ['answer', 'question', 'context', 'afr question'],
        num_rows: 118692
    })
    test: Dataset({
        features: ['answer', 'question', 'context', 'afr question'],
        num_rows: 14838
    })
    validation: Dataset({
        features: ['answer', 'question', 'context', 'afr question'],
        num_rows: 14838
    })
})

The pretrained model was then fine-tuned on the dataset splits. Rather than using only the question, the model also takes in the schema context such that it can generate more accurate queries for a given database.

Input prompt

Table context: CREATE TABLE table_55794 (
    "Home team" text,
    "Home team score" text,
    "Away team" text,
    "Away team score" text,
    "Venue" text,
    "Crowd" real,
    "Date" text
)
Question: Watter tuisspan het'n span mebbourne?
Answer:

Expected Output

SELECT "Home team score" FROM table_55794 WHERE "Away team" = 'melbourne'

Intended uses & limitations

This model takes in a single prompt (similar to the one above) that is tokenized and it then uses the input_ids to generate an output SQL query. However the prompt must be structured in a specific way.

The prompt must start with the table/schema description followed by the question followed by an empty answer. Below we illustrate an example on how to use it. Furthermore, our combined dataset looks as follows:

Tokenized Dataset

DatasetDict({
    train: Dataset({
        features: ['input_ids', 'labels'],
        num_rows: 118692
    })
    test: Dataset({
        features: ['input_ids', 'labels'],
        num_rows: 14838
    })
    validation: Dataset({
        features: ['input_ids', 'labels'],
        num_rows: 14838
    })
})

Usage

from transformers import AutoTokenizer, AutoModelForSeq2SeqLM, Trainer, TrainingArguments
# Load the model and tokenizer from Hugging Face Hub
repo_name = "JsteReubsSoftware/en-af-sql-training-1727527893"
en_af_sql_model = AutoModelForSeq2SeqLM.from_pretrained(repo_name, torch_dtype=torch.bfloat16)
en_af_sql_model = en_af_sql_model.to('cuda')
tokenizer = AutoTokenizer.from_pretrained(repo_name)

question = "Watter tuisspan het'n span mebbourne?"
context = "CREATE TABLE table_55794 (
    "Home team" text,
    "Home team score" text,
    "Away team" text,
    "Away team score" text,
    "Venue" text,
    "Crowd" real,
    "Date" text
)"

prompt = f"""Tables:
{context}

Question:
{question}

Answer:
"""
inputs = tokenizer(prompt, return_tensors='pt')
inputs = inputs.to('cuda')

output = tokenizer.decode(
    en_af_sql_model.generate(
        inputs["input_ids"], 
        max_new_tokens=200,
    )[0], 
    skip_special_tokens=True
)

print("Predicted SQL Query:")
print(output)

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

  • learning_rate: 0.005
  • train_batch_size: 32
  • eval_batch_size: 32
  • seed: 42
  • optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
  • lr_scheduler_type: linear
  • num_epochs: 2

We used the following in our program:

output_dir = f'./en-af-sql-training-{str(int(time.time()))}'

training_args = TrainingArguments(
        output_dir=output_dir,
        learning_rate=5e-3,
        num_train_epochs=2,
        per_device_train_batch_size=16,     # batch size per device during training
        per_device_eval_batch_size=16,      # batch size for evaluation
        weight_decay=0.01,
        logging_steps=50,
        evaluation_strategy='steps',        # evaluation strategy to adopt during training
        eval_steps=500,                     # number of steps between evaluation
)

trainer = Trainer(
        model=finetuned_model,
        args=training_args,
        train_dataset=tokenized_datasets['train'],
        eval_dataset=tokenized_datasets['validation'],
)

Training results

Training Loss Epoch Step Validation Loss
0.0573 0.1348 500 0.0452
0.0424 0.2695 1000 0.0364
0.037 0.4043 1500 0.0323
0.0356 0.5391 2000 0.0287
0.0328 0.6739 2500 0.0269
0.0281 0.8086 3000 0.0255
0.0286 0.9434 3500 0.0238
0.0269 1.0782 4000 0.0233
0.0247 1.2129 4500 0.0225
0.0245 1.3477 5000 0.0217
0.0226 1.4825 5500 0.0214
0.0245 1.6173 6000 0.0211
0.024 1.7520 6500 0.0210
0.0249 1.8868 7000 0.0210

Testing results

After our model was trained and validated, we evaluated the model using four evaluation metrics.

  • Exact Match Accuracy: This measured the accuracy of our model predicting the exact same SQL query as the target query.
  • TSED score: This metric ranges from 0 to 1 and was proposed by this paper. It allows us to estimate the execution performance of the output query, allowing us to estimate the model's execution accuracy.
  • SQAM accuracy: Similar to TSED, we can used this to estimate the output query's execution accuracy (also see this paper).
  • BLEU score: This helps us measure the similarity between the output query and the target query.

The following were the obtained results over the testing set (14838 records):

  • Exact Match = 35.98 %
  • TSED score: 0.897
  • SQAM score: 74.31 %
  • BLEU score: 0.762

Framework versions

  • Transformers 4.44.2
  • Pytorch 2.4.0
  • Datasets 3.0.0
  • Tokenizers 0.19.1