PyTorch - Python Package Training

Overview

The directory provides code to fine tune a transformer model (BERT-base) from Huggingface Transformers Library for sentiment analysis task. BERT (Bidirectional Encoder Representations from Transformers) is a transformers model pre-trained on a large corpus of unlabeled text in a self-supervised fashion. In this sample, we use IMDB sentiment classification dataset for the task. We show you packaging a PyTorch training model to submit it to Vertex AI using pre-built PyTorch containers and handling Python dependencies through Python build scripts (setup.py).

Prerequisites

Setup your project by following the instructions from documentation
Change directories to this sample.

Directory Structure

trainer directory: all Python modules to train the model.
scripts directory: command-line scripts to train the model on Vertex AI.
setup.py: setup.py scripts specifies Python dependencies required for the training job. Vertex Training uses pip to install the package on the training instances allocated for the job.

Trainer Modules

File Name	Purpose
metadata.py	Defines: metadata for classification task such as predefined model dataset name, target labels.
utils.py	Includes: utility functions such as data input functions to read data, save model to GCS bucket.
model.py	Includes: function to create model with a sequence classification head from a pretrained model.
experiment.py	Runs the model training and evaluation experiment, and exports the final model.
task.py	Includes: 1) Initialize and parse task arguments (hyper parameters), and 2) Entry point to the trainer.

Scripts

train-cloud.sh This script submits a training job to Vertex AI

How to run

For local testing, run:

!cd python_package && python -m trainer.task

For cloud training, once the prerequisites are satisfied, update the BUCKET_NAME environment variable in scripts/train-cloud.sh. You may then run the following script to submit an AI Platform Training job:

source ./python_package/scripts/train-cloud.sh

Run on GPU

The provided trainer code runs on a GPU if one is available including data loading and model creation.

To run the trainer code on a different GPU configuration or latest PyTorch pre-built container image, make the following changes to the trainer script.

Update the PyTorch image URI to one of PyTorch pre-built containers
Update the worker-pool-spec in the gcloud command that includes a GPU

Then, run the script to submit a Custom Job on Vertex Training job:

source ./scripts/train-cloud.sh

Versions

This script uses the pre-built PyTorch containers for PyTorch 1.7.

us-docker.pkg.dev/vertex-ai/training/pytorch-gpu.1-7:latest