{ "cells": [ { "cell_type": "markdown", "id": "a9fd592d-0ad0-4f1b-94f4-0e5ae9dbeec7", "metadata": { "id": "a9fd592d-0ad0-4f1b-94f4-0e5ae9dbeec7" }, "source": [ "# Fine-Tune a Causal Language Model for Dialogue Summarization" ] }, { "cell_type": "markdown", "id": "976f3255-166f-4e6f-aaf0-ef6c4703b9f7", "metadata": { "id": "976f3255-166f-4e6f-aaf0-ef6c4703b9f7" }, "source": [ "Fine-tune Meta's Llama 2 chat version for enhanced topic summarization creation of mutlitple choice question (MCQ). Llama 2 is a large language model (LLM) free for research and commercial use. It is one of the top-performing open-source LLM comparable to GPT-3.5 on several benchmarks.\n", "\n", "We will explore the use of Parameter Efficient Fine-Tuning (PEFT - lora) for fine-tuning, and evaluate the resulting model using ROUGE metrics." ] }, { "cell_type": "markdown", "id": "a3b6ce61-1b18-412e-8ea9-b1aa4b6ea33d", "metadata": { "id": "a3b6ce61-1b18-412e-8ea9-b1aa4b6ea33d" }, "source": [ "## Install the pre-requisites\n", "\n", "Uncomment the following if these python packages have been installed" ] }, { "cell_type": "code", "execution_count": null, "id": "4da7e887-227f-4cce-983a-83ab3405de65", "metadata": { "tags": [], "id": "4da7e887-227f-4cce-983a-83ab3405de65" }, "outputs": [], "source": [ "!pip install transformers datasets accelerate sentencepiece scipy peft bitsandbytes evaluate rouge_score" ] }, { "cell_type": "markdown", "id": "a39feac3-3c0f-4057-83e2-898a0329cbee", "metadata": { "id": "a39feac3-3c0f-4057-83e2-898a0329cbee" }, "source": [ "## Request access to Llama-2 weights\n", "\n", "You need to request for access to download the Llama 2 weights. You can either do so through this [link at Meta](https://ai.meta.com/resources/models-and-libraries/llama-downloads/) or through your huggingface account at this [link](https://huggingface.co/meta-llama/Llama-2-7b). Once your request is approved, you will receive an email from Meta with instruction to download the Llama 2 weights, or email from Hugging Face informing you access has been granted.\n", "\n", "If you download the weights from Meta directly, you need to run a conversion script to convert the weights to huggingface format for use with huggingface transformer library." ] }, { "cell_type": "code", "execution_count": null, "id": "4178ec51-ccf8-4d01-8224-e57c8b0683f1", "metadata": { "tags": [], "id": "4178ec51-ccf8-4d01-8224-e57c8b0683f1" }, "outputs": [], "source": [ "# %%bash\n", "# TRANSFORM=`python -c \"import transformers;print('/'.join(transformers.__file__.split('/')[:-1])+'/models/llama/convert_llama_weights_to_hf.py')\"`\n", "# python ${TRANSFORM} --input_dir models --model_size 7B --output_dir models_hf/7B" ] }, { "cell_type": "code", "execution_count": null, "id": "00149073-36b6-4317-818b-c8e0485cd8d2", "metadata": { "tags": [], "id": "00149073-36b6-4317-818b-c8e0485cd8d2", "outputId": "f9177be8-9d26-4968-b0b5-766d83b5ad62", "colab": { "base_uri": "https://localhost:8080/", "height": 145, "referenced_widgets": [ "b82346ca91ad4a73b2bbe3b5c2857807", "3cea202e1a1b45478775d05f27178538", "012d0b97008247b9a03142cf26d0e381", "7250673145e042cb8c952fdf3477f5a6", "8688ae1d1fd544b39cd6b043bbf9e957", "a13691db6f884feeab91a60df5f0e9a7", "a57267ffc803442ca85085b7a88c2933", "45b4e5284a1b4f4bb8b3e5f6746e3c46", "2b62002cf89442e5b5b1a5a5315a6db2", "5446b40d53804682948236d4ba98cdd6", "21ea43357d7e4b2d8b2a3f12d199a3ce", "eb24b1c2c039416a8b63910c9d254147", "0e3de9ad4e1a4affbe73c1ee046c6105", "ed850c64429740f5a9c624ba6a2a4060", "43e5ac30013846e59bd4030e794190b1", "22524832590147ee86f0ece7e4559664", "5133470b3e744f448a94db36c881bd8b", "0013b3a66f06485e90b804db70b2054d", "1d19e959ce4140b596dc72c9d57a193b", "badf45ea38774747ac770c65eb303a5d", "5c6d035439414189bc5af05c7f65bb97", "711d15c6f8504b5db01c30d9bcd56b88", "69baeff0cf2c47d9beb5776a632b4bd6", "6b945dc35b884080be8b3eeaf1f715dd", "8b07392a7fcf49a0acf82d194456a22b", "9dbed28f31ea4ec9b02d4c02e323bd57", "35846883e9594a789da8a543d4964f2e", "f3ad3dbf2df045688f734299b4ecc2cc", "b9154c984d884c36bb149c09dc661976", "0df6b38cecdb4285a3a32c21122ce738", "52b605e03c58484eb906bd4b92343a6b", "03b9c7c857284cdd85cec55f97d6e49f" ] } }, "outputs": [ { "output_type": "display_data", "data": { "text/plain": [ "VBox(children=(HTML(value='
datasets.arrow_dataset.Dataset
def __init__(arrow_table: Table, info: Optional[DatasetInfo]=None, split: Optional[NamedSplit]=None, indices_table: Optional[Table]=None, fingerprint: Optional[str]=None)
/usr/local/lib/python3.10/dist-packages/datasets/arrow_dataset.pyA Dataset backed by an Arrow table.\n", " \n", " " ] }, "metadata": {}, "execution_count": 12 } ] }, { "cell_type": "code", "source": [ "print(dataset_train)\n", "print(dataset_test)\n", "print(dataset_val)" ], "metadata": { "id": "juh2RGTHjApA", "outputId": "38815e79-7b61-4476-8974-cbb2e39ccbfe", "colab": { "base_uri": "https://localhost:8080/" } }, "id": "juh2RGTHjApA", "execution_count": null, "outputs": [ { "output_type": "stream", "name": "stdout", "text": [ "Dataset({\n", " features: ['expected_output', 'instruction', 'input_content'],\n", " num_rows: 684\n", "})\n", "Dataset({\n", " features: ['expected_output', 'instruction', 'input_content'],\n", " num_rows: 147\n", "})\n", "Dataset({\n", " features: ['expected_output', 'instruction', 'input_content'],\n", " num_rows: 147\n", "})\n" ] } ] }, { "cell_type": "markdown", "id": "f5b7da07-9192-4718-b10e-c47fa5634c02", "metadata": { "id": "f5b7da07-9192-4718-b10e-c47fa5634c02" }, "source": [ "Let's taka a look at one of the samples" ] }, { "cell_type": "code", "execution_count": null, "id": "f1ce8a86-3bc2-44d6-9cf1-3dcda2085ede", "metadata": { "tags": [], "id": "f1ce8a86-3bc2-44d6-9cf1-3dcda2085ede", "outputId": "726ded67-4ddd-4601-f17e-b271b48b336b", "colab": { "base_uri": "https://localhost:8080/" } }, "outputs": [ { "output_type": "execute_result", "data": { "text/plain": [ "{'expected_output': '[question]: Which of the following is an application of machine learning in image recognition? [option A]: Analyzing financial transactions for fraud detection. [option B]: Transcribing spoken words into text. [option C]: Classifying images based on their contents. [option D]: Predicting when machinery is likely to fail. [correct_answer]: C, [explanation]:Machine learning algorithms are used in image recognition systems to classify images based on their contents.',\n", " 'instruction': 'Create an MCQ on the application of machine learning in image recognition',\n", " 'input_content': ''}" ] }, "metadata": {}, "execution_count": 16 } ], "source": [ "dataset['train'][600]" ] }, { "cell_type": "markdown", "id": "e8358e2e-e29d-4d5c-89f9-f78959ef5527", "metadata": { "id": "e8358e2e-e29d-4d5c-89f9-f78959ef5527" }, "source": [ "## Test the Model with Zero Shot Inferencing\n", "\n", "Let's test the model with zero shot inferencing (i.e. ask it to summarize without giving any example. You can see that the model struggles to summarize the dialogue compared to the baseline summary, and it is just repeating the conversation." ] }, { "cell_type": "code", "execution_count": null, "id": "31def3d5-e863-4b34-8c2e-408a0cb8f0b4", "metadata": { "tags": [], "id": "31def3d5-e863-4b34-8c2e-408a0cb8f0b4", "outputId": "2d4059e9-4efd-4ac2-82e9-6f35dbaeb146", "colab": { "base_uri": "https://localhost:8080/" } }, "outputs": [ { "output_type": "stream", "name": "stdout", "text": [ "\n", "Create an Multiple choice question:\n", "Artificial neural networks are built on the principles of the structure and operation of human neurons. \n", "It is also known as neural networks or neural nets. An artificial neural network’s input layer, which is the first layer, receives input from external sources and passes it on to the hidden layer, which is the second layer. \n", "Each neuron in the hidden layer gets information from the neurons in the previous layer, computes the weighted total, and then transfers it to the neurons in the next layer. \n", "These connections are weighted, which means that the impacts of the inputs from the preceding layer are more or less optimized by giving each input a distinct weight. \n", "These weights are then adjusted during the training process to enhance the performance of the model. \n", "Artificial neurons, also known as units, are found in artificial neural networks. \n", "The whole Artificial Neural Network is composed of these artificial neurons, which are arranged in a series of layers. \n", "The complexities of neural networks will depend on the complexities of the underlying patterns in the dataset whether a layer has a dozen units or millions of units. \n", "Commonly, Artificial Neural Network has an input layer, an output layer as well as hidden layers. \n", "The input layer receives data from the outside world which the neural network needs to analyze or learn about. \n", "In a fully connected artificial neural network, there is an input layer and one or more hidden layers connected one after the other. \n", "Each neuron receives input from the previous layer neurons or the input layer. The output of one neuron becomes the input to other neurons in the next layer of the network, and this process continues until the final layer produces the output of the network. \n", "Then, after passing through one or more hidden layers, this data is transformed into valuable data for the output layer. Finally, the output layer provides an output in the form of an artificial neural network’s response to the data that comes in.\n", "\n", "---\n", "question:\n", "options A:\n", "options B:\n", "options C:\n", "options D:\n", "correct_answer:\n", "explanation: \n", "Please select the correct answer from the options given below.\n", "\n", "A) Artificial neural networks are built on the principles of the structure and operation of human neurons.\n", "B) An artificial neural network’s input layer, which is the first layer, receives input from external sources and passes it on to the hidden layer, which is the second layer.\n", "C) Each neuron in the hidden layer gets information from the neurons in the previous layer, computes the weighted total, and then transfers it to the neurons in the next layer.\n", "D) Artificial neurons, also known as units, are found in artificial neural networks.\n", "\n", "Please select the correct answer from the options given below.\n" ] } ], "source": [ "eval_prompt = \"\"\"\n", "Create an Multiple choice question:\n", "Artificial neural networks are built on the principles of the structure and operation of human neurons.\n", "It is also known as neural networks or neural nets. An artificial neural network\\u2019s input layer, which is the first layer, receives input from external sources and passes it on to the hidden layer, which is the second layer.\n", "Each neuron in the hidden layer gets information from the neurons in the previous layer, computes the weighted total, and then transfers it to the neurons in the next layer.\n", "These connections are weighted, which means that the impacts of the inputs from the preceding layer are more or less optimized by giving each input a distinct weight.\n", "These weights are then adjusted during the training process to enhance the performance of the model.\n", "Artificial neurons, also known as units, are found in artificial neural networks.\n", "The whole Artificial Neural Network is composed of these artificial neurons, which are arranged in a series of layers.\n", "The complexities of neural networks will depend on the complexities of the underlying patterns in the dataset whether a layer has a dozen units or millions of units.\n", "Commonly, Artificial Neural Network has an input layer, an output layer as well as hidden layers.\n", "The input layer receives data from the outside world which the neural network needs to analyze or learn about.\n", "In a fully connected artificial neural network, there is an input layer and one or more hidden layers connected one after the other.\n", "Each neuron receives input from the previous layer neurons or the input layer. The output of one neuron becomes the input to other neurons in the next layer of the network, and this process continues until the final layer produces the output of the network.\n", "Then, after passing through one or more hidden layers, this data is transformed into valuable data for the output layer. Finally, the output layer provides an output in the form of an artificial neural network\\u2019s response to the data that comes in.\n", "\n", "---\n", "question:\n", "options A:\n", "options B:\n", "options C:\n", "options D:\n", "correct_answer:\n", "explanation:\n", "\"\"\"\n", "\n", "model_input = tokenizer(eval_prompt, return_tensors=\"pt\").to(\"cuda\")\n", "\n", "model.eval()\n", "with torch.no_grad(): # no gradient update\n", " print(tokenizer.decode(model.generate(**model_input, max_new_tokens=200)[0], skip_special_tokens=True))" ] }, { "cell_type": "markdown", "id": "cf285722-2ac5-4e60-8cc6-06c048e6d723", "metadata": { "id": "cf285722-2ac5-4e60-8cc6-06c048e6d723" }, "source": [ "## Creating instruction dataset\n", "\n", "We will now prepare our dataset to fine-tune our base model (instruction fine-tuning)." ] }, { "cell_type": "markdown", "id": "e4012814-8355-4d75-8d0e-e1efbd2ba8de", "metadata": { "tags": [], "id": "e4012814-8355-4d75-8d0e-e1efbd2ba8de" }, "source": [ "### Instruction prompt\n", "\n", "We need to convert the insturctions+ input and expected output (prompt-response) pairs into explicit instructions for the LLM such as follows:\n", "\n", "```\n", "{'text': \"
Step | \n", "Training Loss | \n", "Validation Loss | \n", "
---|---|---|
10 | \n", "2.041100 | \n", "1.894824 | \n", "
20 | \n", "1.784500 | \n", "1.614028 | \n", "
30 | \n", "1.487900 | \n", "1.455886 | \n", "
40 | \n", "1.400600 | \n", "1.362805 | \n", "
50 | \n", "1.339200 | \n", "1.303091 | \n", "
60 | \n", "1.300000 | \n", "1.256285 | \n", "
70 | \n", "1.255700 | \n", "1.216590 | \n", "
"
],
"text/plain": [
" "
]
},
"metadata": {}
},
{
"output_type": "stream",
"name": "stderr",
"text": [
"/usr/local/lib/python3.10/dist-packages/torch/utils/checkpoint.py:429: UserWarning: torch.utils.checkpoint: please pass in use_reentrant=True or use_reentrant=False explicitly. The default value of use_reentrant will be updated to be False in the future. To maintain current behavior, pass use_reentrant=True. It is recommended that you use use_reentrant=False. Refer to docs for more details on the differences between the two variants.\n",
" warnings.warn(\n",
"/usr/local/lib/python3.10/dist-packages/bitsandbytes/autograd/_functions.py:322: UserWarning: MatMul8bitLt: inputs will be cast from torch.float32 to float16 during quantization\n",
" warnings.warn(f\"MatMul8bitLt: inputs will be cast from {A.dtype} to float16 during quantization\")\n",
"/usr/local/lib/python3.10/dist-packages/torch/utils/checkpoint.py:429: UserWarning: torch.utils.checkpoint: please pass in use_reentrant=True or use_reentrant=False explicitly. The default value of use_reentrant will be updated to be False in the future. To maintain current behavior, pass use_reentrant=True. It is recommended that you use use_reentrant=False. Refer to docs for more details on the differences between the two variants.\n",
" warnings.warn(\n",
"/usr/local/lib/python3.10/dist-packages/bitsandbytes/autograd/_functions.py:322: UserWarning: MatMul8bitLt: inputs will be cast from torch.float32 to float16 during quantization\n",
" warnings.warn(f\"MatMul8bitLt: inputs will be cast from {A.dtype} to float16 during quantization\")\n",
"/usr/local/lib/python3.10/dist-packages/torch/utils/checkpoint.py:429: UserWarning: torch.utils.checkpoint: please pass in use_reentrant=True or use_reentrant=False explicitly. The default value of use_reentrant will be updated to be False in the future. To maintain current behavior, pass use_reentrant=True. It is recommended that you use use_reentrant=False. Refer to docs for more details on the differences between the two variants.\n",
" warnings.warn(\n",
"/usr/local/lib/python3.10/dist-packages/bitsandbytes/autograd/_functions.py:322: UserWarning: MatMul8bitLt: inputs will be cast from torch.float32 to float16 during quantization\n",
" warnings.warn(f\"MatMul8bitLt: inputs will be cast from {A.dtype} to float16 during quantization\")\n",
"/usr/local/lib/python3.10/dist-packages/torch/utils/checkpoint.py:429: UserWarning: torch.utils.checkpoint: please pass in use_reentrant=True or use_reentrant=False explicitly. The default value of use_reentrant will be updated to be False in the future. To maintain current behavior, pass use_reentrant=True. It is recommended that you use use_reentrant=False. Refer to docs for more details on the differences between the two variants.\n",
" warnings.warn(\n",
"/usr/local/lib/python3.10/dist-packages/bitsandbytes/autograd/_functions.py:322: UserWarning: MatMul8bitLt: inputs will be cast from torch.float32 to float16 during quantization\n",
" warnings.warn(f\"MatMul8bitLt: inputs will be cast from {A.dtype} to float16 during quantization\")\n",
"/usr/local/lib/python3.10/dist-packages/torch/utils/checkpoint.py:429: UserWarning: torch.utils.checkpoint: please pass in use_reentrant=True or use_reentrant=False explicitly. The default value of use_reentrant will be updated to be False in the future. To maintain current behavior, pass use_reentrant=True. It is recommended that you use use_reentrant=False. Refer to docs for more details on the differences between the two variants.\n",
" warnings.warn(\n",
"/usr/local/lib/python3.10/dist-packages/bitsandbytes/autograd/_functions.py:322: UserWarning: MatMul8bitLt: inputs will be cast from torch.float32 to float16 during quantization\n",
" warnings.warn(f\"MatMul8bitLt: inputs will be cast from {A.dtype} to float16 during quantization\")\n",
"/usr/local/lib/python3.10/dist-packages/torch/utils/checkpoint.py:429: UserWarning: torch.utils.checkpoint: please pass in use_reentrant=True or use_reentrant=False explicitly. The default value of use_reentrant will be updated to be False in the future. To maintain current behavior, pass use_reentrant=True. It is recommended that you use use_reentrant=False. Refer to docs for more details on the differences between the two variants.\n",
" warnings.warn(\n",
"/usr/local/lib/python3.10/dist-packages/bitsandbytes/autograd/_functions.py:322: UserWarning: MatMul8bitLt: inputs will be cast from torch.float32 to float16 during quantization\n",
" warnings.warn(f\"MatMul8bitLt: inputs will be cast from {A.dtype} to float16 during quantization\")\n",
"/usr/local/lib/python3.10/dist-packages/torch/utils/checkpoint.py:429: UserWarning: torch.utils.checkpoint: please pass in use_reentrant=True or use_reentrant=False explicitly. The default value of use_reentrant will be updated to be False in the future. To maintain current behavior, pass use_reentrant=True. It is recommended that you use use_reentrant=False. Refer to docs for more details on the differences between the two variants.\n",
" warnings.warn(\n",
"/usr/local/lib/python3.10/dist-packages/bitsandbytes/autograd/_functions.py:322: UserWarning: MatMul8bitLt: inputs will be cast from torch.float32 to float16 during quantization\n",
" warnings.warn(f\"MatMul8bitLt: inputs will be cast from {A.dtype} to float16 during quantization\")\n",
"/usr/local/lib/python3.10/dist-packages/torch/utils/checkpoint.py:429: UserWarning: torch.utils.checkpoint: please pass in use_reentrant=True or use_reentrant=False explicitly. The default value of use_reentrant will be updated to be False in the future. To maintain current behavior, pass use_reentrant=True. It is recommended that you use use_reentrant=False. Refer to docs for more details on the differences between the two variants.\n",
" warnings.warn(\n",
"/usr/local/lib/python3.10/dist-packages/bitsandbytes/autograd/_functions.py:322: UserWarning: MatMul8bitLt: inputs will be cast from torch.float32 to float16 during quantization\n",
" warnings.warn(f\"MatMul8bitLt: inputs will be cast from {A.dtype} to float16 during quantization\")\n",
"/usr/local/lib/python3.10/dist-packages/torch/utils/checkpoint.py:429: UserWarning: torch.utils.checkpoint: please pass in use_reentrant=True or use_reentrant=False explicitly. The default value of use_reentrant will be updated to be False in the future. To maintain current behavior, pass use_reentrant=True. It is recommended that you use use_reentrant=False. Refer to docs for more details on the differences between the two variants.\n",
" warnings.warn(\n",
"/usr/local/lib/python3.10/dist-packages/bitsandbytes/autograd/_functions.py:322: UserWarning: MatMul8bitLt: inputs will be cast from torch.float32 to float16 during quantization\n",
" warnings.warn(f\"MatMul8bitLt: inputs will be cast from {A.dtype} to float16 during quantization\")\n",
"/usr/local/lib/python3.10/dist-packages/torch/utils/checkpoint.py:429: UserWarning: torch.utils.checkpoint: please pass in use_reentrant=True or use_reentrant=False explicitly. The default value of use_reentrant will be updated to be False in the future. To maintain current behavior, pass use_reentrant=True. It is recommended that you use use_reentrant=False. Refer to docs for more details on the differences between the two variants.\n",
" warnings.warn(\n",
"/usr/local/lib/python3.10/dist-packages/bitsandbytes/autograd/_functions.py:322: UserWarning: MatMul8bitLt: inputs will be cast from torch.float32 to float16 during quantization\n",
" warnings.warn(f\"MatMul8bitLt: inputs will be cast from {A.dtype} to float16 during quantization\")\n",
"/usr/local/lib/python3.10/dist-packages/torch/utils/checkpoint.py:429: UserWarning: torch.utils.checkpoint: please pass in use_reentrant=True or use_reentrant=False explicitly. The default value of use_reentrant will be updated to be False in the future. To maintain current behavior, pass use_reentrant=True. It is recommended that you use use_reentrant=False. Refer to docs for more details on the differences between the two variants.\n",
" warnings.warn(\n",
"/usr/local/lib/python3.10/dist-packages/bitsandbytes/autograd/_functions.py:322: UserWarning: MatMul8bitLt: inputs will be cast from torch.float32 to float16 during quantization\n",
" warnings.warn(f\"MatMul8bitLt: inputs will be cast from {A.dtype} to float16 during quantization\")\n",
"/usr/local/lib/python3.10/dist-packages/torch/utils/checkpoint.py:429: UserWarning: torch.utils.checkpoint: please pass in use_reentrant=True or use_reentrant=False explicitly. The default value of use_reentrant will be updated to be False in the future. To maintain current behavior, pass use_reentrant=True. It is recommended that you use use_reentrant=False. Refer to docs for more details on the differences between the two variants.\n",
" warnings.warn(\n",
"/usr/local/lib/python3.10/dist-packages/bitsandbytes/autograd/_functions.py:322: UserWarning: MatMul8bitLt: inputs will be cast from torch.float32 to float16 during quantization\n",
" warnings.warn(f\"MatMul8bitLt: inputs will be cast from {A.dtype} to float16 during quantization\")\n"
]
},
{
"output_type": "execute_result",
"data": {
"text/plain": [
"TrainOutput(global_step=200, training_loss=1.2242571496963501, metrics={'train_runtime': 3556.5472, 'train_samples_per_second': 0.225, 'train_steps_per_second': 0.056, 'total_flos': 1.6248515592192e+16, 'train_loss': 1.2242571496963501, 'epoch': 3.25})"
]
},
"metadata": {},
"execution_count": 33
}
],
"source": [
"# Start training\n",
"\n",
"trainer.train()"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "93524441-36a1-4f7e-8307-b855e68c5c14",
"metadata": {
"tags": [],
"id": "93524441-36a1-4f7e-8307-b855e68c5c14",
"colab": {
"base_uri": "https://localhost:8080/",
"height": 178
},
"outputId": "b6424c6a-3ff9-47e9-9a7c-f0137908a733"
},
"outputs": [
{
"output_type": "stream",
"name": "stderr",
"text": [
"/usr/local/lib/python3.10/dist-packages/bitsandbytes/autograd/_functions.py:322: UserWarning: MatMul8bitLt: inputs will be cast from torch.float32 to float16 during quantization\n",
" warnings.warn(f\"MatMul8bitLt: inputs will be cast from {A.dtype} to float16 during quantization\")\n"
]
},
{
"output_type": "display_data",
"data": {
"text/plain": [
"\n",
" \n",
"
\n",
" \n",
" \n",
" \n",
" Step \n",
" Training Loss \n",
" Validation Loss \n",
" \n",
" \n",
" 10 \n",
" 2.041100 \n",
" 1.894824 \n",
" \n",
" \n",
" 20 \n",
" 1.784500 \n",
" 1.614028 \n",
" \n",
" \n",
" 30 \n",
" 1.487900 \n",
" 1.455886 \n",
" \n",
" \n",
" 40 \n",
" 1.400600 \n",
" 1.362805 \n",
" \n",
" \n",
" 50 \n",
" 1.339200 \n",
" 1.303091 \n",
" \n",
" \n",
" 60 \n",
" 1.300000 \n",
" 1.256285 \n",
" \n",
" \n",
" 70 \n",
" 1.255700 \n",
" 1.216590 \n",
" \n",
" \n",
" 80 \n",
" 1.150800 \n",
" 1.181747 \n",
" \n",
" \n",
" 90 \n",
" 1.175700 \n",
" 1.151433 \n",
" \n",
" \n",
" 100 \n",
" 1.148600 \n",
" 1.126270 \n",
" \n",
" \n",
" 110 \n",
" 1.083100 \n",
" 1.106496 \n",
" \n",
" \n",
" 120 \n",
" 1.075300 \n",
" 1.090339 \n",
" \n",
" \n",
" 130 \n",
" 1.081900 \n",
" 1.076817 \n",
" \n",
" \n",
" 140 \n",
" 1.029900 \n",
" 1.066747 \n",
" \n",
" \n",
" 150 \n",
" 1.034400 \n",
" 1.057581 \n",
" \n",
" \n",
" 160 \n",
" 1.055500 \n",
" 1.050728 \n",
" \n",
" \n",
" 170 \n",
" 1.042700 \n",
" 1.044938 \n",
" \n",
" \n",
" 180 \n",
" 1.039200 \n",
" 1.041671 \n",
" \n",
" \n",
" 190 \n",
" 0.981000 \n",
" 1.039009 \n",
" \n",
" \n",
" \n",
"200 \n",
" 0.978100 \n",
" 1.038632 \n",
"
Copy a token from your Hugging Face\ntokens page and paste it below.
Immediately click login after copying\nyour token or it might be stored in plain text in this notebook file.