Tjdharani
/

minGPT

Model card Files Files and versions Community

File size: 55,400 Bytes

14f4c62

{
  "nbformat": 4,
  "nbformat_minor": 0,
  "metadata": {
    "colab": {
      "provenance": [],
      "gpuType": "T4"
    },
    "kernelspec": {
      "name": "python3",
      "display_name": "Python 3"
    },
    "language_info": {
      "name": "python"
    },
    "accelerator": "GPU"
  },
  "cells": [
    {
      "cell_type": "markdown",
      "source": [
        "#Building GPT"
      ],
      "metadata": {
        "id": "8FHnXpkTv_5f"
      }
    },
    {
      "cell_type": "code",
      "source": [
        "# We always start with a dataset to train on. Let's download the tiny shakespeare dataset\n",
        "!wget https://raw.githubusercontent.com/karpathy/char-rnn/master/data/tinyshakespeare/input.txt"
      ],
      "metadata": {
        "colab": {
          "base_uri": "https://localhost:8080/"
        },
        "id": "YTPlvPQn-Zef",
        "outputId": "45f9c50f-d2c6-4629-cabe-d1378e2882a7"
      },
      "execution_count": 1,
      "outputs": [
        {
          "output_type": "stream",
          "name": "stdout",
          "text": [
            "--2023-06-13 07:55:40--  https://raw.githubusercontent.com/karpathy/char-rnn/master/data/tinyshakespeare/input.txt\n",
            "Resolving raw.githubusercontent.com (raw.githubusercontent.com)... 185.199.108.133, 185.199.109.133, 185.199.110.133, ...\n",
            "Connecting to raw.githubusercontent.com (raw.githubusercontent.com)|185.199.108.133|:443... connected.\n",
            "HTTP request sent, awaiting response... 200 OK\n",
            "Length: 1115394 (1.1M) [text/plain]\n",
            "Saving to: ‘input.txt’\n",
            "\n",
            "input.txt           100%[===================>]   1.06M  --.-KB/s    in 0.005s  \n",
            "\n",
            "2023-06-13 07:55:40 (199 MB/s) - ‘input.txt’ saved [1115394/1115394]\n",
            "\n"
          ]
        }
      ]
    },
    {
      "cell_type": "code",
      "source": [
        "with open('input.txt', 'r', encoding='utf-8') as f:\n",
        "    text = f.read()"
      ],
      "metadata": {
        "id": "mfIiqOSm-euI"
      },
      "execution_count": 2,
      "outputs": []
    },
    {
      "cell_type": "code",
      "source": [
        "print(\"length of dataset in characters:\", len(text))\n"
      ],
      "metadata": {
        "colab": {
          "base_uri": "https://localhost:8080/"
        },
        "id": "4Qgkvnr0_N66",
        "outputId": "6063f096-78b7-40c1-c830-531594a0bb1a"
      },
      "execution_count": 3,
      "outputs": [
        {
          "output_type": "stream",
          "name": "stdout",
          "text": [
            "length of dataset in characters: 1115394\n"
          ]
        }
      ]
    },
    {
      "cell_type": "code",
      "source": [
        "# let's look at the first 1000 characters\n",
        "print(text[:1000])"
      ],
      "metadata": {
        "colab": {
          "base_uri": "https://localhost:8080/"
        },
        "id": "Qn9QIHwf_c-_",
        "outputId": "4f4f837a-7b53-43fd-807e-42d16b0519c6"
      },
      "execution_count": 4,
      "outputs": [
        {
          "output_type": "stream",
          "name": "stdout",
          "text": [
            "First Citizen:\n",
            "Before we proceed any further, hear me speak.\n",
            "\n",
            "All:\n",
            "Speak, speak.\n",
            "\n",
            "First Citizen:\n",
            "You are all resolved rather to die than to famish?\n",
            "\n",
            "All:\n",
            "Resolved. resolved.\n",
            "\n",
            "First Citizen:\n",
            "First, you know Caius Marcius is chief enemy to the people.\n",
            "\n",
            "All:\n",
            "We know't, we know't.\n",
            "\n",
            "First Citizen:\n",
            "Let us kill him, and we'll have corn at our own price.\n",
            "Is't a verdict?\n",
            "\n",
            "All:\n",
            "No more talking on't; let it be done: away, away!\n",
            "\n",
            "Second Citizen:\n",
            "One word, good citizens.\n",
            "\n",
            "First Citizen:\n",
            "We are accounted poor citizens, the patricians good.\n",
            "What authority surfeits on would relieve us: if they\n",
            "would yield us but the superfluity, while it were\n",
            "wholesome, we might guess they relieved us humanely;\n",
            "but they think we are too dear: the leanness that\n",
            "afflicts us, the object of our misery, is as an\n",
            "inventory to particularise their abundance; our\n",
            "sufferance is a gain to them Let us revenge this with\n",
            "our pikes, ere we become rakes: for the gods know I\n",
            "speak this in hunger for bread, not in thirst for revenge.\n",
            "\n",
            "\n"
          ]
        }
      ]
    },
    {
      "cell_type": "code",
      "source": [
        "# here are all the unique characters that occur in this text\n",
        "chars = sorted(list(set(text)))\n",
        "vocab_size = len(chars)\n",
        "print(''.join(chars))\n",
        "print(vocab_size)"
      ],
      "metadata": {
        "colab": {
          "base_uri": "https://localhost:8080/"
        },
        "id": "JN8_xJFY_zvq",
        "outputId": "d0ab20bb-c366-41af-9378-15ced2913126"
      },
      "execution_count": 5,
      "outputs": [
        {
          "output_type": "stream",
          "name": "stdout",
          "text": [
            "\n",
            " !$&',-.3:;?ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz\n",
            "65\n"
          ]
        }
      ]
    },
    {
      "cell_type": "code",
      "source": [
        "# create a mapping from characters to integers \n",
        "stoi = { ch:i for i, ch in enumerate(chars)}\n",
        "itos = { i:ch for i, ch in enumerate(chars)}\n",
        "encode = lambda s: [stoi[c] for c in s] # sting to integer\n",
        "decode = lambda l: ''.join([itos[i] for i in l]) # integer to string\n",
        "\n",
        "print(encode(\"hii there\"))\n",
        "print(decode(encode(\"hii there\")))"
      ],
      "metadata": {
        "colab": {
          "base_uri": "https://localhost:8080/"
        },
        "id": "X1lJF7-IAjz_",
        "outputId": "18702fc0-b1c0-4675-b78a-e047a06f4887"
      },
      "execution_count": 6,
      "outputs": [
        {
          "output_type": "stream",
          "name": "stdout",
          "text": [
            "[46, 47, 47, 1, 58, 46, 43, 56, 43]\n",
            "hii there\n"
          ]
        }
      ]
    },
    {
      "cell_type": "code",
      "source": [
        "# let's now encode the entire text dataset and store it into torch.Tensor\n",
        "import torch # PyTorch\n",
        "data = torch.tensor(encode(text), dtype=torch.long)\n",
        "print(data.shape, data.dtype)\n",
        "print(data[:1000])"
      ],
      "metadata": {
        "colab": {
          "base_uri": "https://localhost:8080/"
        },
        "id": "ML1pjHfLCJ_M",
        "outputId": "3f21fc94-ed1f-4bb5-b9db-0a1ad2e5b227"
      },
      "execution_count": 7,
      "outputs": [
        {
          "output_type": "stream",
          "name": "stdout",
          "text": [
            "torch.Size([1115394]) torch.int64\n",
            "tensor([18, 47, 56, 57, 58,  1, 15, 47, 58, 47, 64, 43, 52, 10,  0, 14, 43, 44,\n",
            "        53, 56, 43,  1, 61, 43,  1, 54, 56, 53, 41, 43, 43, 42,  1, 39, 52, 63,\n",
            "         1, 44, 59, 56, 58, 46, 43, 56,  6,  1, 46, 43, 39, 56,  1, 51, 43,  1,\n",
            "        57, 54, 43, 39, 49,  8,  0,  0, 13, 50, 50, 10,  0, 31, 54, 43, 39, 49,\n",
            "         6,  1, 57, 54, 43, 39, 49,  8,  0,  0, 18, 47, 56, 57, 58,  1, 15, 47,\n",
            "        58, 47, 64, 43, 52, 10,  0, 37, 53, 59,  1, 39, 56, 43,  1, 39, 50, 50,\n",
            "         1, 56, 43, 57, 53, 50, 60, 43, 42,  1, 56, 39, 58, 46, 43, 56,  1, 58,\n",
            "        53,  1, 42, 47, 43,  1, 58, 46, 39, 52,  1, 58, 53,  1, 44, 39, 51, 47,\n",
            "        57, 46, 12,  0,  0, 13, 50, 50, 10,  0, 30, 43, 57, 53, 50, 60, 43, 42,\n",
            "         8,  1, 56, 43, 57, 53, 50, 60, 43, 42,  8,  0,  0, 18, 47, 56, 57, 58,\n",
            "         1, 15, 47, 58, 47, 64, 43, 52, 10,  0, 18, 47, 56, 57, 58,  6,  1, 63,\n",
            "        53, 59,  1, 49, 52, 53, 61,  1, 15, 39, 47, 59, 57,  1, 25, 39, 56, 41,\n",
            "        47, 59, 57,  1, 47, 57,  1, 41, 46, 47, 43, 44,  1, 43, 52, 43, 51, 63,\n",
            "         1, 58, 53,  1, 58, 46, 43,  1, 54, 43, 53, 54, 50, 43,  8,  0,  0, 13,\n",
            "        50, 50, 10,  0, 35, 43,  1, 49, 52, 53, 61,  5, 58,  6,  1, 61, 43,  1,\n",
            "        49, 52, 53, 61,  5, 58,  8,  0,  0, 18, 47, 56, 57, 58,  1, 15, 47, 58,\n",
            "        47, 64, 43, 52, 10,  0, 24, 43, 58,  1, 59, 57,  1, 49, 47, 50, 50,  1,\n",
            "        46, 47, 51,  6,  1, 39, 52, 42,  1, 61, 43,  5, 50, 50,  1, 46, 39, 60,\n",
            "        43,  1, 41, 53, 56, 52,  1, 39, 58,  1, 53, 59, 56,  1, 53, 61, 52,  1,\n",
            "        54, 56, 47, 41, 43,  8,  0, 21, 57,  5, 58,  1, 39,  1, 60, 43, 56, 42,\n",
            "        47, 41, 58, 12,  0,  0, 13, 50, 50, 10,  0, 26, 53,  1, 51, 53, 56, 43,\n",
            "         1, 58, 39, 50, 49, 47, 52, 45,  1, 53, 52,  5, 58, 11,  1, 50, 43, 58,\n",
            "         1, 47, 58,  1, 40, 43,  1, 42, 53, 52, 43, 10,  1, 39, 61, 39, 63,  6,\n",
            "         1, 39, 61, 39, 63,  2,  0,  0, 31, 43, 41, 53, 52, 42,  1, 15, 47, 58,\n",
            "        47, 64, 43, 52, 10,  0, 27, 52, 43,  1, 61, 53, 56, 42,  6,  1, 45, 53,\n",
            "        53, 42,  1, 41, 47, 58, 47, 64, 43, 52, 57,  8,  0,  0, 18, 47, 56, 57,\n",
            "        58,  1, 15, 47, 58, 47, 64, 43, 52, 10,  0, 35, 43,  1, 39, 56, 43,  1,\n",
            "        39, 41, 41, 53, 59, 52, 58, 43, 42,  1, 54, 53, 53, 56,  1, 41, 47, 58,\n",
            "        47, 64, 43, 52, 57,  6,  1, 58, 46, 43,  1, 54, 39, 58, 56, 47, 41, 47,\n",
            "        39, 52, 57,  1, 45, 53, 53, 42,  8,  0, 35, 46, 39, 58,  1, 39, 59, 58,\n",
            "        46, 53, 56, 47, 58, 63,  1, 57, 59, 56, 44, 43, 47, 58, 57,  1, 53, 52,\n",
            "         1, 61, 53, 59, 50, 42,  1, 56, 43, 50, 47, 43, 60, 43,  1, 59, 57, 10,\n",
            "         1, 47, 44,  1, 58, 46, 43, 63,  0, 61, 53, 59, 50, 42,  1, 63, 47, 43,\n",
            "        50, 42,  1, 59, 57,  1, 40, 59, 58,  1, 58, 46, 43,  1, 57, 59, 54, 43,\n",
            "        56, 44, 50, 59, 47, 58, 63,  6,  1, 61, 46, 47, 50, 43,  1, 47, 58,  1,\n",
            "        61, 43, 56, 43,  0, 61, 46, 53, 50, 43, 57, 53, 51, 43,  6,  1, 61, 43,\n",
            "         1, 51, 47, 45, 46, 58,  1, 45, 59, 43, 57, 57,  1, 58, 46, 43, 63,  1,\n",
            "        56, 43, 50, 47, 43, 60, 43, 42,  1, 59, 57,  1, 46, 59, 51, 39, 52, 43,\n",
            "        50, 63, 11,  0, 40, 59, 58,  1, 58, 46, 43, 63,  1, 58, 46, 47, 52, 49,\n",
            "         1, 61, 43,  1, 39, 56, 43,  1, 58, 53, 53,  1, 42, 43, 39, 56, 10,  1,\n",
            "        58, 46, 43,  1, 50, 43, 39, 52, 52, 43, 57, 57,  1, 58, 46, 39, 58,  0,\n",
            "        39, 44, 44, 50, 47, 41, 58, 57,  1, 59, 57,  6,  1, 58, 46, 43,  1, 53,\n",
            "        40, 48, 43, 41, 58,  1, 53, 44,  1, 53, 59, 56,  1, 51, 47, 57, 43, 56,\n",
            "        63,  6,  1, 47, 57,  1, 39, 57,  1, 39, 52,  0, 47, 52, 60, 43, 52, 58,\n",
            "        53, 56, 63,  1, 58, 53,  1, 54, 39, 56, 58, 47, 41, 59, 50, 39, 56, 47,\n",
            "        57, 43,  1, 58, 46, 43, 47, 56,  1, 39, 40, 59, 52, 42, 39, 52, 41, 43,\n",
            "        11,  1, 53, 59, 56,  0, 57, 59, 44, 44, 43, 56, 39, 52, 41, 43,  1, 47,\n",
            "        57,  1, 39,  1, 45, 39, 47, 52,  1, 58, 53,  1, 58, 46, 43, 51,  1, 24,\n",
            "        43, 58,  1, 59, 57,  1, 56, 43, 60, 43, 52, 45, 43,  1, 58, 46, 47, 57,\n",
            "         1, 61, 47, 58, 46,  0, 53, 59, 56,  1, 54, 47, 49, 43, 57,  6,  1, 43,\n",
            "        56, 43,  1, 61, 43,  1, 40, 43, 41, 53, 51, 43,  1, 56, 39, 49, 43, 57,\n",
            "        10,  1, 44, 53, 56,  1, 58, 46, 43,  1, 45, 53, 42, 57,  1, 49, 52, 53,\n",
            "        61,  1, 21,  0, 57, 54, 43, 39, 49,  1, 58, 46, 47, 57,  1, 47, 52,  1,\n",
            "        46, 59, 52, 45, 43, 56,  1, 44, 53, 56,  1, 40, 56, 43, 39, 42,  6,  1,\n",
            "        52, 53, 58,  1, 47, 52,  1, 58, 46, 47, 56, 57, 58,  1, 44, 53, 56,  1,\n",
            "        56, 43, 60, 43, 52, 45, 43,  8,  0,  0])\n"
          ]
        }
      ]
    },
    {
      "cell_type": "code",
      "source": [
        "# split the data into train and validation set\n",
        "n = int(0.9*len(data)) #train 90% data\n",
        "train_data = data[:n]\n",
        "val_data = data[n:]"
      ],
      "metadata": {
        "id": "F-6DyilNE7KM"
      },
      "execution_count": 8,
      "outputs": []
    },
    {
      "cell_type": "code",
      "source": [
        "block_size = 8\n",
        "train_data[:block_size+1]"
      ],
      "metadata": {
        "colab": {
          "base_uri": "https://localhost:8080/"
        },
        "id": "z79mbyx-GJC-",
        "outputId": "b4b90aae-90f9-4f07-bbc0-2f726b0ff4d3"
      },
      "execution_count": 9,
      "outputs": [
        {
          "output_type": "execute_result",
          "data": {
            "text/plain": [
              "tensor([18, 47, 56, 57, 58,  1, 15, 47, 58])"
            ]
          },
          "metadata": {},
          "execution_count": 9
        }
      ]
    },
    {
      "cell_type": "code",
      "source": [
        "x = train_data[:block_size]\n",
        "y = train_data[1:block_size+1]\n",
        "for t in range(block_size):\n",
        "    context = x[:t+1]\n",
        "    target = y[t]\n",
        "    print(f\"when input is {context} the target: {target}\")"
      ],
      "metadata": {
        "colab": {
          "base_uri": "https://localhost:8080/"
        },
        "id": "5SQI_jZXGb7_",
        "outputId": "52404a4a-91dd-4757-9c7e-c30a8a2eb2a3"
      },
      "execution_count": 10,
      "outputs": [
        {
          "output_type": "stream",
          "name": "stdout",
          "text": [
            "when input is tensor([18]) the target: 47\n",
            "when input is tensor([18, 47]) the target: 56\n",
            "when input is tensor([18, 47, 56]) the target: 57\n",
            "when input is tensor([18, 47, 56, 57]) the target: 58\n",
            "when input is tensor([18, 47, 56, 57, 58]) the target: 1\n",
            "when input is tensor([18, 47, 56, 57, 58,  1]) the target: 15\n",
            "when input is tensor([18, 47, 56, 57, 58,  1, 15]) the target: 47\n",
            "when input is tensor([18, 47, 56, 57, 58,  1, 15, 47]) the target: 58\n"
          ]
        }
      ]
    },
    {
      "cell_type": "code",
      "source": [
        "torch.manual_seed(1337)\n",
        "batch_size = 4\n",
        "block_size = 8\n",
        "\n",
        "def get_batch(split):\n",
        "  # generate a small batch of data of inputs x and targets y\n",
        "  data = train_data if split == 'train' else val_data\n",
        "  ix = torch.randint(len(data) - block_size, (batch_size,))\n",
        "  x = torch.stack([data[i:i+block_size] for i in ix])\n",
        "  y = torch.stack([data[i+1:i+block_size+1] for i in ix])\n",
        "  return x, y\n",
        "\n",
        "xb, yb = get_batch('train')\n",
        "print('inputs:')\n",
        "print(xb.shape)\n",
        "print(xb)\n",
        "print('targets:')\n",
        "print(yb.shape)\n",
        "print(yb)\n",
        "\n",
        "print('----')\n",
        "\n",
        "for b in range(batch_size): # batch dimension\n",
        "    for t in range(block_size): # time dimension\n",
        "        context = xb[b, :t+1]\n",
        "        target = yb[b,t]\n",
        "        print(f\"when input is {context.tolist()} the target: {target}\")"
      ],
      "metadata": {
        "colab": {
          "base_uri": "https://localhost:8080/"
        },
        "id": "IAjhF0PTI1HF",
        "outputId": "245c0f68-9502-4633-d365-e411176a5a14"
      },
      "execution_count": 11,
      "outputs": [
        {
          "output_type": "stream",
          "name": "stdout",
          "text": [
            "inputs:\n",
            "torch.Size([4, 8])\n",
            "tensor([[24, 43, 58,  5, 57,  1, 46, 43],\n",
            "        [44, 53, 56,  1, 58, 46, 39, 58],\n",
            "        [52, 58,  1, 58, 46, 39, 58,  1],\n",
            "        [25, 17, 27, 10,  0, 21,  1, 54]])\n",
            "targets:\n",
            "torch.Size([4, 8])\n",
            "tensor([[43, 58,  5, 57,  1, 46, 43, 39],\n",
            "        [53, 56,  1, 58, 46, 39, 58,  1],\n",
            "        [58,  1, 58, 46, 39, 58,  1, 46],\n",
            "        [17, 27, 10,  0, 21,  1, 54, 39]])\n",
            "----\n",
            "when input is [24] the target: 43\n",
            "when input is [24, 43] the target: 58\n",
            "when input is [24, 43, 58] the target: 5\n",
            "when input is [24, 43, 58, 5] the target: 57\n",
            "when input is [24, 43, 58, 5, 57] the target: 1\n",
            "when input is [24, 43, 58, 5, 57, 1] the target: 46\n",
            "when input is [24, 43, 58, 5, 57, 1, 46] the target: 43\n",
            "when input is [24, 43, 58, 5, 57, 1, 46, 43] the target: 39\n",
            "when input is [44] the target: 53\n",
            "when input is [44, 53] the target: 56\n",
            "when input is [44, 53, 56] the target: 1\n",
            "when input is [44, 53, 56, 1] the target: 58\n",
            "when input is [44, 53, 56, 1, 58] the target: 46\n",
            "when input is [44, 53, 56, 1, 58, 46] the target: 39\n",
            "when input is [44, 53, 56, 1, 58, 46, 39] the target: 58\n",
            "when input is [44, 53, 56, 1, 58, 46, 39, 58] the target: 1\n",
            "when input is [52] the target: 58\n",
            "when input is [52, 58] the target: 1\n",
            "when input is [52, 58, 1] the target: 58\n",
            "when input is [52, 58, 1, 58] the target: 46\n",
            "when input is [52, 58, 1, 58, 46] the target: 39\n",
            "when input is [52, 58, 1, 58, 46, 39] the target: 58\n",
            "when input is [52, 58, 1, 58, 46, 39, 58] the target: 1\n",
            "when input is [52, 58, 1, 58, 46, 39, 58, 1] the target: 46\n",
            "when input is [25] the target: 17\n",
            "when input is [25, 17] the target: 27\n",
            "when input is [25, 17, 27] the target: 10\n",
            "when input is [25, 17, 27, 10] the target: 0\n",
            "when input is [25, 17, 27, 10, 0] the target: 21\n",
            "when input is [25, 17, 27, 10, 0, 21] the target: 1\n",
            "when input is [25, 17, 27, 10, 0, 21, 1] the target: 54\n",
            "when input is [25, 17, 27, 10, 0, 21, 1, 54] the target: 39\n"
          ]
        }
      ]
    },
    {
      "cell_type": "code",
      "source": [
        "print (xb) # our input to the transformer"
      ],
      "metadata": {
        "colab": {
          "base_uri": "https://localhost:8080/"
        },
        "id": "Sy2A0cbXM1Bd",
        "outputId": "ba015f11-ee15-435e-b88a-2ad4164d7abe"
      },
      "execution_count": 12,
      "outputs": [
        {
          "output_type": "stream",
          "name": "stdout",
          "text": [
            "tensor([[24, 43, 58,  5, 57,  1, 46, 43],\n",
            "        [44, 53, 56,  1, 58, 46, 39, 58],\n",
            "        [52, 58,  1, 58, 46, 39, 58,  1],\n",
            "        [25, 17, 27, 10,  0, 21,  1, 54]])\n"
          ]
        }
      ]
    },
    {
      "cell_type": "code",
      "source": [
        "import torch\n",
        "import torch.nn as nn\n",
        "from torch.nn import functional as F\n",
        "torch.manual_seed(1337)\n",
        "\n",
        "class BigramLanguageModel(nn.Module):\n",
        "\n",
        "    def __init__(self, vocab_size):\n",
        "        super().__init__()\n",
        "        # each token directly reads off the logits for the next token from a lookup table\n",
        "        self.token_embedding_table = nn.Embedding(vocab_size, vocab_size)\n",
        "\n",
        "    def forward(self, idx, targets=None):\n",
        "\n",
        "        # idx and targets are both (B,T) tensor of integers\n",
        "        logits = self.token_embedding_table(idx) # (B,T,C)\n",
        "        \n",
        "        if targets is None:\n",
        "            loss = None\n",
        "        else:\n",
        "            B, T, C = logits.shape\n",
        "            logits = logits.view(B*T, C)\n",
        "            targets = targets.view(B*T)\n",
        "            loss = F.cross_entropy(logits, targets)\n",
        "\n",
        "        return logits, loss\n",
        "    \n",
        "    def generate(self, idx, max_new_tokens):\n",
        "        # idx is (B, T) array of indices in the current context\n",
        "        for _ in range(max_new_tokens):\n",
        "            # get the predictions\n",
        "            logits, loss = self(idx)\n",
        "            # focus only on the last time step\n",
        "            logits = logits[:, -1, :] # becomes (B, C)\n",
        "            # apply softmax to get probabilities\n",
        "            probs = F.softmax(logits, dim=-1) # (B, C)\n",
        "            # sample from the distribution\n",
        "            idx_next = torch.multinomial(probs, num_samples=1) # (B, 1)\n",
        "            # append sampled index to the running sequence\n",
        "            idx = torch.cat((idx, idx_next), dim=1) # (B, T+1)\n",
        "        return idx\n",
        "\n",
        "m = BigramLanguageModel(vocab_size)\n",
        "logits, loss = m(xb, yb)\n",
        "print(logits.shape)\n",
        "print(loss)\n",
        "\n",
        "print(decode(m.generate(idx = torch.zeros((1, 1), dtype=torch.long), max_new_tokens=100)[0].tolist()))\n"
      ],
      "metadata": {
        "colab": {
          "base_uri": "https://localhost:8080/"
        },
        "id": "JadlSYPFfV5i",
        "outputId": "48885ec2-7337-4d9b-8931-9db5b06ff04a"
      },
      "execution_count": 13,
      "outputs": [
        {
          "output_type": "stream",
          "name": "stdout",
          "text": [
            "torch.Size([32, 65])\n",
            "tensor(4.8786, grad_fn=<NllLossBackward0>)\n",
            "\n",
            "Sr?qP-QWktXoL&jLDJgOLVz'RIoDqHdhsV&vLLxatjscMpwLERSPyao.qfzs$Ys$zF-w,;eEkzxjgCKFChs!iWW.ObzDnxA Ms$3\n"
          ]
        }
      ]
    },
    {
      "cell_type": "code",
      "source": [
        "# create a PyTorch optimizer\n",
        "optimizer = torch.optim.AdamW(m.parameters(), lr=1e-3)"
      ],
      "metadata": {
        "id": "kC6Sf0DkfZEs"
      },
      "execution_count": 14,
      "outputs": []
    },
    {
      "cell_type": "code",
      "source": [
        "batch_size = 32\n",
        "for steps in range(100):\n",
        "\n",
        "  xb, yb = get_batch('train')\n",
        "\n",
        "  logits, loss = m(xb, yb)\n",
        "  optimizer.zero_grad(set_to_none=True)\n",
        "  loss.backward()\n",
        "  optimizer.step()\n",
        "\n",
        "print(loss.item())"
      ],
      "metadata": {
        "colab": {
          "base_uri": "https://localhost:8080/"
        },
        "id": "eAdiWhq8mq0v",
        "outputId": "2210d81b-5438-4e35-9336-5f30567de53d"
      },
      "execution_count": 15,
      "outputs": [
        {
          "output_type": "stream",
          "name": "stdout",
          "text": [
            "4.587916374206543\n"
          ]
        }
      ]
    },
    {
      "cell_type": "code",
      "source": [
        "print(decode(m.generate(idx = torch.zeros((1, 1), dtype = torch.long), max_new_tokens=500)[0].tolist()))"
      ],
      "metadata": {
        "colab": {
          "base_uri": "https://localhost:8080/"
        },
        "id": "9I0z9v9NnVcW",
        "outputId": "07133374-3061-41e3-9e0e-77ba644c3c94"
      },
      "execution_count": 16,
      "outputs": [
        {
          "output_type": "stream",
          "name": "stdout",
          "text": [
            "\n",
            "xiKi-RJ:CgqVuUa!U?qMH.uk!sCuMXvv!CJFfx;LgRyJknOEti.?I&-gPlLyulId?XlaInQ'q,lT$\n",
            "3Q&sGlvHQ?mqSq-eON\n",
            "x?SP fUAfCAuCX:bOlgiRQWN:Mphaw\n",
            "tRLKuYXEaAXxrcq-gCUzeh3w!AcyaylgYWjmJM?Uzw:inaY,:C&OECW:vmGGJAn3onAuMgia!ms$Vb q-gCOcPcUhOnxJGUGSPJWT:.?ujmJFoiNL&A'DxY,prZ?qdT;hoo'dHooXXlxf'WkHK&u3Q?rqUi.kz;?Yx?C&u3Qbfzxlyh'Vl:zyxjKXgC?\n",
            "lv'QKFiBeviNxO'm!Upm$srm&TqViqiBD3HBP!juEOpmZJyF$Fwfy!PlvWPFC\n",
            "&WDdP!Ko,px\n",
            "x\n",
            "tREOE;AJ.BeXkylOVD3KHp$e?nD,.SFbWWI'ubcL!q-tU;aXmJ&uGXHxJXI&Z!gHRpajj;l.\n",
            "pTErIBjx;JKIgoCnLGXrJSP!AU-AcbczR?\n"
          ]
        }
      ]
    },
    {
      "cell_type": "markdown",
      "source": [
        "#Mathematical Trick in self-attention"
      ],
      "metadata": {
        "id": "JPRFdk7pn7Xz"
      }
    },
    {
      "cell_type": "code",
      "source": [
        "# toy example for M Mul for weighted Aggregation\n",
        "torch.manual_seed(42)\n",
        "a = torch.tril(torch.ones(3, 3))\n",
        "a = a / torch.sum(a, 1, keepdim=True)\n",
        "b = torch.randint(0,10,(3,2)).float()\n",
        "c = a @ b\n",
        "print('a=')\n",
        "print(a)\n",
        "print('--')\n",
        "print('b=')\n",
        "print(b)\n",
        "print('--')\n",
        "print('c=')\n",
        "print(c)\n"
      ],
      "metadata": {
        "colab": {
          "base_uri": "https://localhost:8080/"
        },
        "id": "z-XvQJi_u0HL",
        "outputId": "486bcbac-c42e-494c-e9a0-341779370076"
      },
      "execution_count": 17,
      "outputs": [
        {
          "output_type": "stream",
          "name": "stdout",
          "text": [
            "a=\n",
            "tensor([[1.0000, 0.0000, 0.0000],\n",
            "        [0.5000, 0.5000, 0.0000],\n",
            "        [0.3333, 0.3333, 0.3333]])\n",
            "--\n",
            "b=\n",
            "tensor([[2., 7.],\n",
            "        [6., 4.],\n",
            "        [6., 5.]])\n",
            "--\n",
            "c=\n",
            "tensor([[2.0000, 7.0000],\n",
            "        [4.0000, 5.5000],\n",
            "        [4.6667, 5.3333]])\n"
          ]
        }
      ]
    },
    {
      "cell_type": "code",
      "source": [
        "torch.manual_seed(1337)\n",
        "B,T,C = 4,8,2 # BATCH, TIME, CHANNELS\n",
        "x = torch.randn(B,T,C)\n",
        "x.shape"
      ],
      "metadata": {
        "colab": {
          "base_uri": "https://localhost:8080/"
        },
        "id": "8zInghO3v5yg",
        "outputId": "4f7a38e9-05a2-494b-eda1-2d8ca136fe03"
      },
      "execution_count": 18,
      "outputs": [
        {
          "output_type": "execute_result",
          "data": {
            "text/plain": [
              "torch.Size([4, 8, 2])"
            ]
          },
          "metadata": {},
          "execution_count": 18
        }
      ]
    },
    {
      "cell_type": "code",
      "source": [
        "xbow = torch.zeros((B,T,C))\n",
        "for b in range(B):\n",
        "  for t in range(T):\n",
        "    xprev = x[b, :t+1]\n",
        "    xbow[b,t] = torch.mean(xprev, 0)"
      ],
      "metadata": {
        "id": "kM4Az6f3xXwz"
      },
      "execution_count": 19,
      "outputs": []
    },
    {
      "cell_type": "code",
      "source": [
        "wei = torch.tril(torch.ones(T, T))\n",
        "wei = wei / wei.sum(1, keepdim=True)\n",
        "xbow2 = wei @ x\n",
        "torch.allclose(xbow, xbow2)"
      ],
      "metadata": {
        "colab": {
          "base_uri": "https://localhost:8080/"
        },
        "id": "j6mzu409x9qt",
        "outputId": "16c8abd7-5e22-4c7e-b2e4-fc53041411d2"
      },
      "execution_count": 20,
      "outputs": [
        {
          "output_type": "execute_result",
          "data": {
            "text/plain": [
              "True"
            ]
          },
          "metadata": {},
          "execution_count": 20
        }
      ]
    },
    {
      "cell_type": "code",
      "source": [
        "tril = torch.tril(torch.ones(T, T))\n",
        "wei = torch.zeros((T, T))\n",
        "wei = wei.masked_fill(tril == 0, float('-inf'))\n",
        "wei = F.softmax(wei, dim=-1)\n",
        "xbow3 = wei @ x\n",
        "torch.allclose(xbow, xbow3)"
      ],
      "metadata": {
        "colab": {
          "base_uri": "https://localhost:8080/"
        },
        "id": "Ez5cxjXjyeyA",
        "outputId": "8cf70b82-93bb-4b9a-c29c-50342c99ca0b"
      },
      "execution_count": 22,
      "outputs": [
        {
          "output_type": "execute_result",
          "data": {
            "text/plain": [
              "True"
            ]
          },
          "metadata": {},
          "execution_count": 22
        }
      ]
    },
    {
      "cell_type": "code",
      "source": [
        "# Self-attention !\n",
        "torch.manual_seed(1337)\n",
        "B,T,C = 4,8,32\n",
        "x = torch.randn(B,T,C)\n",
        "\n",
        "# Single head perform self-attention\n",
        "head_size = 16\n",
        "key = nn.Linear(C, head_size, bias=False)\n",
        "query = nn.Linear(C, head_size, bias=False)\n",
        "value = nn.Linear(C, head_size, bias=False)\n",
        "k = key(x)\n",
        "q = query(x)\n",
        "wei = q @ k.transpose(-2, -1)\n",
        "\n",
        "tril = torch.tril(torch.ones(T, T))\n",
        "wei = wei.masked_fill(tril == 0, float('-inf'))\n",
        "wei = F.softmax(wei, dim=-1)\n",
        "\n",
        "v = value(x)\n",
        "out = wei @ v\n",
        "\n",
        "out.shape"
      ],
      "metadata": {
        "colab": {
          "base_uri": "https://localhost:8080/"
        },
        "id": "d4fbZKO_zJlE",
        "outputId": "61bfb573-3b08-4e83-aed1-cdb4be76ead8"
      },
      "execution_count": 23,
      "outputs": [
        {
          "output_type": "execute_result",
          "data": {
            "text/plain": [
              "torch.Size([4, 8, 16])"
            ]
          },
          "metadata": {},
          "execution_count": 23
        }
      ]
    },
    {
      "cell_type": "code",
      "source": [
        "wei[0]"
      ],
      "metadata": {
        "colab": {
          "base_uri": "https://localhost:8080/"
        },
        "id": "5mUg8q-D1xJ3",
        "outputId": "24f9aa45-1d20-4bc6-8efb-af5f5fb9899c"
      },
      "execution_count": 24,
      "outputs": [
        {
          "output_type": "execute_result",
          "data": {
            "text/plain": [
              "tensor([[1.0000, 0.0000, 0.0000, 0.0000, 0.0000, 0.0000, 0.0000, 0.0000],\n",
              "        [0.1574, 0.8426, 0.0000, 0.0000, 0.0000, 0.0000, 0.0000, 0.0000],\n",
              "        [0.2088, 0.1646, 0.6266, 0.0000, 0.0000, 0.0000, 0.0000, 0.0000],\n",
              "        [0.5792, 0.1187, 0.1889, 0.1131, 0.0000, 0.0000, 0.0000, 0.0000],\n",
              "        [0.0294, 0.1052, 0.0469, 0.0276, 0.7909, 0.0000, 0.0000, 0.0000],\n",
              "        [0.0176, 0.2689, 0.0215, 0.0089, 0.6812, 0.0019, 0.0000, 0.0000],\n",
              "        [0.1691, 0.4066, 0.0438, 0.0416, 0.1048, 0.2012, 0.0329, 0.0000],\n",
              "        [0.0210, 0.0843, 0.0555, 0.2297, 0.0573, 0.0709, 0.2423, 0.2391]],\n",
              "       grad_fn=<SelectBackward0>)"
            ]
          },
          "metadata": {},
          "execution_count": 24
        }
      ]
    },
    {
      "cell_type": "code",
      "source": [
        "k = torch.randn(B,T,head_size)\n",
        "q = torch.randn(B,T,head_size)\n",
        "wei = q @ k.transpose(-2, -1) * head_size**-0.5"
      ],
      "metadata": {
        "id": "L6Hz65jN11C5"
      },
      "execution_count": 25,
      "outputs": []
    },
    {
      "cell_type": "code",
      "source": [
        "k.var()"
      ],
      "metadata": {
        "colab": {
          "base_uri": "https://localhost:8080/"
        },
        "id": "opow74Yg82UN",
        "outputId": "7937ca44-b52d-4373-ae58-d0c1ed450fa7"
      },
      "execution_count": 26,
      "outputs": [
        {
          "output_type": "execute_result",
          "data": {
            "text/plain": [
              "tensor(1.0449)"
            ]
          },
          "metadata": {},
          "execution_count": 26
        }
      ]
    },
    {
      "cell_type": "code",
      "source": [
        "q.var()"
      ],
      "metadata": {
        "colab": {
          "base_uri": "https://localhost:8080/"
        },
        "id": "jEGJMlZh86lD",
        "outputId": "c093ea15-9db4-408b-8898-0192748f8ab2"
      },
      "execution_count": 27,
      "outputs": [
        {
          "output_type": "execute_result",
          "data": {
            "text/plain": [
              "tensor(1.0700)"
            ]
          },
          "metadata": {},
          "execution_count": 27
        }
      ]
    },
    {
      "cell_type": "code",
      "source": [
        "wei.var()"
      ],
      "metadata": {
        "colab": {
          "base_uri": "https://localhost:8080/"
        },
        "id": "37djNLHJ88Gh",
        "outputId": "a3ba1d4b-bca5-41a2-afa5-f135056b80ba"
      },
      "execution_count": 28,
      "outputs": [
        {
          "output_type": "execute_result",
          "data": {
            "text/plain": [
              "tensor(1.0918)"
            ]
          },
          "metadata": {},
          "execution_count": 28
        }
      ]
    },
    {
      "cell_type": "code",
      "source": [
        "torch.softmax(torch.tensor([0.1, -0.2, 0.3, -0.2, 0.5]), dim=-1)"
      ],
      "metadata": {
        "colab": {
          "base_uri": "https://localhost:8080/"
        },
        "id": "3NK1li0w89wx",
        "outputId": "4205b108-d666-4add-dd3e-48da20a6e351"
      },
      "execution_count": 29,
      "outputs": [
        {
          "output_type": "execute_result",
          "data": {
            "text/plain": [
              "tensor([0.1925, 0.1426, 0.2351, 0.1426, 0.2872])"
            ]
          },
          "metadata": {},
          "execution_count": 29
        }
      ]
    },
    {
      "cell_type": "code",
      "source": [
        "torch.softmax(torch.tensor([0.1, -0.2, 0.3,-0.2,0.5])*8, dim=-1)"
      ],
      "metadata": {
        "colab": {
          "base_uri": "https://localhost:8080/"
        },
        "id": "-3UqDMG79QLI",
        "outputId": "61674514-3887-43a4-93aa-055dfcd61b76"
      },
      "execution_count": 30,
      "outputs": [
        {
          "output_type": "execute_result",
          "data": {
            "text/plain": [
              "tensor([0.0326, 0.0030, 0.1615, 0.0030, 0.8000])"
            ]
          },
          "metadata": {},
          "execution_count": 30
        }
      ]
    },
    {
      "cell_type": "code",
      "source": [
        "class LayerNorm1d: # (used to be BatchNorm1d)\n",
        "  \n",
        "  def __init__(self, dim, eps=1e-5, momentum=0.1):\n",
        "    self.eps = eps\n",
        "    self.gamma = torch.ones(dim)\n",
        "    self.beta = torch.zeros(dim)\n",
        "  \n",
        "  def __call__(self, x):\n",
        "    # calculate the forward pass\n",
        "    xmean = x.mean(1, keepdim=True) # batch mean\n",
        "    xvar = x.var(1, keepdim=True) # batch variance\n",
        "    xhat = (x - xmean) / torch.sqrt(xvar + self.eps) # normalize to unit variance\n",
        "    self.out = self.gamma * xhat + self.beta\n",
        "    return self.out\n",
        "  \n",
        "  def parameters(self):\n",
        "    return [self.gamma, self.beta]\n",
        "\n",
        "torch.manual_seed(1337)\n",
        "module = LayerNorm1d(100)\n",
        "x = torch.randn(32, 100) # batch size 32 of 100-dimensional vectors\n",
        "x = module(x)\n",
        "x.shape"
      ],
      "metadata": {
        "colab": {
          "base_uri": "https://localhost:8080/"
        },
        "id": "a_572UNcChia",
        "outputId": "87012d0d-81cd-4841-a4e8-48bf9c0e2e61"
      },
      "execution_count": 32,
      "outputs": [
        {
          "output_type": "execute_result",
          "data": {
            "text/plain": [
              "torch.Size([32, 100])"
            ]
          },
          "metadata": {},
          "execution_count": 32
        }
      ]
    },
    {
      "cell_type": "code",
      "source": [
        "x[:, 0].mean(), x[:,0].std()"
      ],
      "metadata": {
        "colab": {
          "base_uri": "https://localhost:8080/"
        },
        "id": "LHfhDFW1Coel",
        "outputId": "7eff9314-f287-4566-aa4d-7d9082bff11b"
      },
      "execution_count": 33,
      "outputs": [
        {
          "output_type": "execute_result",
          "data": {
            "text/plain": [
              "(tensor(0.1469), tensor(0.8803))"
            ]
          },
          "metadata": {},
          "execution_count": 33
        }
      ]
    },
    {
      "cell_type": "code",
      "source": [
        "x[0,:].mean(), x[0,:].std()"
      ],
      "metadata": {
        "colab": {
          "base_uri": "https://localhost:8080/"
        },
        "id": "bt7xbja2FOu-",
        "outputId": "8f1cbfe0-7862-4ba0-bd54-7149a78b7153"
      },
      "execution_count": 34,
      "outputs": [
        {
          "output_type": "execute_result",
          "data": {
            "text/plain": [
              "(tensor(-9.5367e-09), tensor(1.0000))"
            ]
          },
          "metadata": {},
          "execution_count": 34
        }
      ]
    },
    {
      "cell_type": "code",
      "source": [
        "import torch\n",
        "import torch.nn as nn\n",
        "from torch.nn import functional as F\n",
        "\n",
        "# hyperparameters\n",
        "batch_size = 16 # how many independent sequences will we process in parallel?\n",
        "block_size = 32 # what is the maximum context length for predictions?\n",
        "max_iters = 5000\n",
        "eval_interval = 100\n",
        "learning_rate = 1e-3\n",
        "device = 'cuda' if torch.cuda.is_available() else 'cpu'\n",
        "eval_iters = 200\n",
        "n_embd = 64\n",
        "n_head = 4\n",
        "n_layer = 4\n",
        "dropout = 0.0\n",
        "\n",
        "torch.manual_seed(1337)\n",
        "\n",
        "# wget https://raw.githubusercontent.com/karpathy/char-rnn/master/data/tinyshakespeare/input.txt\n",
        "with open('input.txt', 'r', encoding='utf-8') as f:\n",
        "    text = f.read()\n",
        "\n",
        "# here are all the unique characters that occur in this text\n",
        "chars = sorted(list(set(text)))\n",
        "vocab_size = len(chars)\n",
        "# create a mapping from characters to integers\n",
        "stoi = { ch:i for i,ch in enumerate(chars) }\n",
        "itos = { i:ch for i,ch in enumerate(chars) }\n",
        "encode = lambda s: [stoi[c] for c in s] # encoder: take a string, output a list of integers\n",
        "decode = lambda l: ''.join([itos[i] for i in l]) # decoder: take a list of integers, output a string\n",
        "\n",
        "# Train and test splits\n",
        "data = torch.tensor(encode(text), dtype=torch.long)\n",
        "n = int(0.9*len(data)) # first 90% will be train, rest val\n",
        "train_data = data[:n]\n",
        "val_data = data[n:]\n",
        "\n",
        "# data loading\n",
        "def get_batch(split):\n",
        "    # generate a small batch of data of inputs x and targets y\n",
        "    data = train_data if split == 'train' else val_data\n",
        "    ix = torch.randint(len(data) - block_size, (batch_size,))\n",
        "    x = torch.stack([data[i:i+block_size] for i in ix])\n",
        "    y = torch.stack([data[i+1:i+block_size+1] for i in ix])\n",
        "    x, y = x.to(device), y.to(device)\n",
        "    return x, y\n",
        "\n",
        "@torch.no_grad()\n",
        "def estimate_loss():\n",
        "    out = {}\n",
        "    model.eval()\n",
        "    for split in ['train', 'val']:\n",
        "        losses = torch.zeros(eval_iters)\n",
        "        for k in range(eval_iters):\n",
        "            X, Y = get_batch(split)\n",
        "            logits, loss = model(X, Y)\n",
        "            losses[k] = loss.item()\n",
        "        out[split] = losses.mean()\n",
        "    model.train()\n",
        "    return out\n",
        "\n",
        "class Head(nn.Module):\n",
        "    \"\"\" one head of self-attention \"\"\"\n",
        "\n",
        "    def __init__(self, head_size):\n",
        "        super().__init__()\n",
        "        self.key = nn.Linear(n_embd, head_size, bias=False)\n",
        "        self.query = nn.Linear(n_embd, head_size, bias=False)\n",
        "        self.value = nn.Linear(n_embd, head_size, bias=False)\n",
        "        self.register_buffer('tril', torch.tril(torch.ones(block_size, block_size)))\n",
        "\n",
        "        self.dropout = nn.Dropout(dropout)\n",
        "\n",
        "    def forward(self, x):\n",
        "        B,T,C = x.shape\n",
        "        k = self.key(x)   # (B,T,C)\n",
        "        q = self.query(x) # (B,T,C)\n",
        "        # compute attention scores (\"affinities\")\n",
        "        wei = q @ k.transpose(-2,-1) * C**-0.5 # (B, T, C) @ (B, C, T) -> (B, T, T)\n",
        "        wei = wei.masked_fill(self.tril[:T, :T] == 0, float('-inf')) # (B, T, T)\n",
        "        wei = F.softmax(wei, dim=-1) # (B, T, T)\n",
        "        wei = self.dropout(wei)\n",
        "        # perform the weighted aggregation of the values\n",
        "        v = self.value(x) # (B,T,C)\n",
        "        out = wei @ v # (B, T, T) @ (B, T, C) -> (B, T, C)\n",
        "        return out\n",
        "\n",
        "class MultiHeadAttention(nn.Module):\n",
        "    \"\"\" multiple heads of self-attention in parallel \"\"\"\n",
        "\n",
        "    def __init__(self, num_heads, head_size):\n",
        "        super().__init__()\n",
        "        self.heads = nn.ModuleList([Head(head_size) for _ in range(num_heads)])\n",
        "        self.proj = nn.Linear(n_embd, n_embd)\n",
        "        self.dropout = nn.Dropout(dropout)\n",
        "\n",
        "    def forward(self, x):\n",
        "        out = torch.cat([h(x) for h in self.heads], dim=-1)\n",
        "        out = self.dropout(self.proj(out))\n",
        "        return out\n",
        "\n",
        "class FeedFoward(nn.Module):\n",
        "    \"\"\" a simple linear layer followed by a non-linearity \"\"\"\n",
        "\n",
        "    def __init__(self, n_embd):\n",
        "        super().__init__()\n",
        "        self.net = nn.Sequential(\n",
        "            nn.Linear(n_embd, 4 * n_embd),\n",
        "            nn.ReLU(),\n",
        "            nn.Linear(4 * n_embd, n_embd),\n",
        "            nn.Dropout(dropout),\n",
        "        )\n",
        "\n",
        "    def forward(self, x):\n",
        "        return self.net(x)\n",
        "\n",
        "class Block(nn.Module):\n",
        "    \"\"\" Transformer block: communication followed by computation \"\"\"\n",
        "\n",
        "    def __init__(self, n_embd, n_head):\n",
        "        # n_embd: embedding dimension, n_head: the number of heads we'd like\n",
        "        super().__init__()\n",
        "        head_size = n_embd // n_head\n",
        "        self.sa = MultiHeadAttention(n_head, head_size)\n",
        "        self.ffwd = FeedFoward(n_embd)\n",
        "        self.ln1 = nn.LayerNorm(n_embd)\n",
        "        self.ln2 = nn.LayerNorm(n_embd)\n",
        "\n",
        "    def forward(self, x):\n",
        "        x = x + self.sa(self.ln1(x))\n",
        "        x = x + self.ffwd(self.ln2(x))\n",
        "        return x\n",
        "\n",
        "# super simple bigram model\n",
        "class BigramLanguageModel(nn.Module):\n",
        "\n",
        "    def __init__(self):\n",
        "        super().__init__()\n",
        "        # each token directly reads off the logits for the next token from a lookup table\n",
        "        self.token_embedding_table = nn.Embedding(vocab_size, n_embd)\n",
        "        self.position_embedding_table = nn.Embedding(block_size, n_embd)\n",
        "        self.blocks = nn.Sequential(*[Block(n_embd, n_head=n_head) for _ in range(n_layer)])\n",
        "        self.ln_f = nn.LayerNorm(n_embd) # final layer norm\n",
        "        self.lm_head = nn.Linear(n_embd, vocab_size)\n",
        "\n",
        "    def forward(self, idx, targets=None):\n",
        "        B, T = idx.shape\n",
        "\n",
        "        # idx and targets are both (B,T) tensor of integers\n",
        "        tok_emb = self.token_embedding_table(idx) # (B,T,C)\n",
        "        pos_emb = self.position_embedding_table(torch.arange(T, device=device)) # (T,C)\n",
        "        x = tok_emb + pos_emb # (B,T,C)\n",
        "        x = self.blocks(x) # (B,T,C)\n",
        "        x = self.ln_f(x) # (B,T,C)\n",
        "        logits = self.lm_head(x) # (B,T,vocab_size)\n",
        "\n",
        "        if targets is None:\n",
        "            loss = None\n",
        "        else:\n",
        "            B, T, C = logits.shape\n",
        "            logits = logits.view(B*T, C)\n",
        "            targets = targets.view(B*T)\n",
        "            loss = F.cross_entropy(logits, targets)\n",
        "\n",
        "        return logits, loss\n",
        "\n",
        "    def generate(self, idx, max_new_tokens):\n",
        "        # idx is (B, T) array of indices in the current context\n",
        "        for _ in range(max_new_tokens):\n",
        "            # crop idx to the last block_size tokens\n",
        "            idx_cond = idx[:, -block_size:]\n",
        "            # get the predictions\n",
        "            logits, loss = self(idx_cond)\n",
        "            # focus only on the last time step\n",
        "            logits = logits[:, -1, :] # becomes (B, C)\n",
        "            # apply softmax to get probabilities\n",
        "            probs = F.softmax(logits, dim=-1) # (B, C)\n",
        "            # sample from the distribution\n",
        "            idx_next = torch.multinomial(probs, num_samples=1) # (B, 1)\n",
        "            # append sampled index to the running sequence\n",
        "            idx = torch.cat((idx, idx_next), dim=1) # (B, T+1)\n",
        "        return idx\n",
        "\n",
        "model = BigramLanguageModel()\n",
        "m = model.to(device)\n",
        "# print the number of parameters in the model\n",
        "print(sum(p.numel() for p in m.parameters())/1e6, 'M parameters')\n",
        "\n",
        "# create a PyTorch optimizer\n",
        "optimizer = torch.optim.AdamW(model.parameters(), lr=learning_rate)\n",
        "\n",
        "for iter in range(max_iters):\n",
        "\n",
        "    # every once in a while evaluate the loss on train and val sets\n",
        "    if iter % eval_interval == 0 or iter == max_iters - 1:\n",
        "        losses = estimate_loss()\n",
        "        print(f\"step {iter}: train loss {losses['train']:.4f}, val loss {losses['val']:.4f}\")\n",
        "\n",
        "    # sample a batch of data\n",
        "    xb, yb = get_batch('train')\n",
        "\n",
        "    # evaluate the loss\n",
        "    logits, loss = model(xb, yb)\n",
        "    optimizer.zero_grad(set_to_none=True)\n",
        "    loss.backward()\n",
        "    optimizer.step()\n",
        "\n",
        "# generate from the model\n",
        "context = torch.zeros((1, 1), dtype=torch.long, device=device )\n",
        "print(decode(m.generate(context, max_new_tokens=2000)[0].tolist()))"
      ],
      "metadata": {
        "colab": {
          "base_uri": "https://localhost:8080/"
        },
        "id": "WYnRTqPbFXHy",
        "outputId": "d625a959-7490-4a84-e692-600da91e0ef9"
      },
      "execution_count": 35,
      "outputs": [
        {
          "output_type": "stream",
          "name": "stdout",
          "text": [
            "0.209729 M parameters\n",
            "step 0: train loss 4.4116, val loss 4.4022\n",
            "step 100: train loss 2.6568, val loss 2.6670\n",
            "step 200: train loss 2.5090, val loss 2.5059\n",
            "step 300: train loss 2.4196, val loss 2.4338\n",
            "step 400: train loss 2.3503, val loss 2.3565\n",
            "step 500: train loss 2.2966, val loss 2.3129\n",
            "step 600: train loss 2.2410, val loss 2.2500\n",
            "step 700: train loss 2.2051, val loss 2.2191\n",
            "step 800: train loss 2.1640, val loss 2.1874\n",
            "step 900: train loss 2.1251, val loss 2.1515\n",
            "step 1000: train loss 2.1023, val loss 2.1291\n",
            "step 1100: train loss 2.0699, val loss 2.1192\n",
            "step 1200: train loss 2.0375, val loss 2.0797\n",
            "step 1300: train loss 2.0259, val loss 2.0647\n",
            "step 1400: train loss 1.9924, val loss 2.0362\n",
            "step 1500: train loss 1.9700, val loss 2.0304\n",
            "step 1600: train loss 1.9631, val loss 2.0476\n",
            "step 1700: train loss 1.9412, val loss 2.0131\n",
            "step 1800: train loss 1.9097, val loss 1.9960\n",
            "step 1900: train loss 1.9101, val loss 1.9882\n",
            "step 2000: train loss 1.8867, val loss 1.9976\n",
            "step 2100: train loss 1.8720, val loss 1.9754\n",
            "step 2200: train loss 1.8588, val loss 1.9606\n",
            "step 2300: train loss 1.8542, val loss 1.9525\n",
            "step 2400: train loss 1.8424, val loss 1.9464\n",
            "step 2500: train loss 1.8173, val loss 1.9455\n",
            "step 2600: train loss 1.8256, val loss 1.9388\n",
            "step 2700: train loss 1.8116, val loss 1.9350\n",
            "step 2800: train loss 1.8056, val loss 1.9214\n",
            "step 2900: train loss 1.8040, val loss 1.9300\n",
            "step 3000: train loss 1.7974, val loss 1.9205\n",
            "step 3100: train loss 1.7694, val loss 1.9157\n",
            "step 3200: train loss 1.7539, val loss 1.9115\n",
            "step 3300: train loss 1.7571, val loss 1.9071\n",
            "step 3400: train loss 1.7531, val loss 1.8954\n",
            "step 3500: train loss 1.7368, val loss 1.8918\n",
            "step 3600: train loss 1.7274, val loss 1.8884\n",
            "step 3700: train loss 1.7301, val loss 1.8819\n",
            "step 3800: train loss 1.7210, val loss 1.8938\n",
            "step 3900: train loss 1.7260, val loss 1.8750\n",
            "step 4000: train loss 1.7122, val loss 1.8554\n",
            "step 4100: train loss 1.7129, val loss 1.8717\n",
            "step 4200: train loss 1.7041, val loss 1.8634\n",
            "step 4300: train loss 1.6986, val loss 1.8434\n",
            "step 4400: train loss 1.7052, val loss 1.8605\n",
            "step 4500: train loss 1.6881, val loss 1.8467\n",
            "step 4600: train loss 1.6849, val loss 1.8318\n",
            "step 4700: train loss 1.6833, val loss 1.8449\n",
            "step 4800: train loss 1.6686, val loss 1.8472\n",
            "step 4900: train loss 1.6719, val loss 1.8425\n",
            "step 4999: train loss 1.6619, val loss 1.8215\n",
            "\n",
            "And they bride will to lay be madie;\n",
            "Thou but take O-dam the change:\n",
            "Warth full him tother dilth ane away, my fears,\n",
            "You have was them of is heart mile,\n",
            "You, and if ensmy contlatist, drov the does me now that\n",
            "just, lesing that.\n",
            "His my now, you up; and the tyby love.\n",
            "In Bodiet, and whom\n",
            "that demperakenous, so what evily well my\n",
            "Murtus censurence of him the reshep and thrust for to imper my monte in Mont,\n",
            "To fight? gry of thy hourb! stiddy as\n",
            "ards bearing her broint must are no Runnts\n",
            "Infortuce will me not be arm.\n",
            "You contrantymes have myse.-\n",
            "And fortwerle madam them may in son, live body.\n",
            "\n",
            "Think you:\n",
            "It stay might. \n",
            "CLAMENCE:\n",
            "My whilesse everew in movet, if Cassce of's counted;\n",
            "How what make you fear tals: the gold my sun?\n",
            "What, loudy forgor man our him.\n",
            "I will were but with some. Povinly Ford the welcont.\n",
            "\n",
            "QUEEN FIDILIZ:\n",
            "No?\n",
            "Their him the not.\n",
            "\n",
            "POLIXENENE:\n",
            "But to me, God no now the summe wip.\n",
            "\n",
            "GROMPEO:\n",
            "Conguit, bruke this belike, on so han the bodiet.\n",
            "\n",
            "CORIOLANUS:\n",
            "Till the;\n",
            "you wellseers I am with you,\n",
            "For I hust no where Mustconce, do wind that I am nobly.\n",
            "\n",
            "BRUSTHORD:\n",
            "O, wenterings so me worting.\n",
            "\n",
            "GRUMIO:\n",
            "O thus favour now,\n",
            "An bear was all beenIn\n",
            "Before and to the sever--and.\n",
            "In to dot me, to liberfeleing breamn'd my have\n",
            "epince, if that jutcey's leve,\n",
            "That Tumselfly there's little ofjess the vown;\n",
            "Maughter armied maste love in stide belothy dong'd the not.\n",
            "\n",
            "BENVOLIO:\n",
            "Well cavonzy to I have must aboe;\n",
            "I now, I thinke numt om Three teny, delelige,\n",
            "And yet our son one old, we\n",
            "ell sment on you; and plock, say, as If have to kavidess corby?\n",
            "Then eteep; upose worth\n",
            "But arm one wall preven him there.\n",
            "\n",
            "BUCKINGHARD\n",
            "\n",
            "IVIRHAMIUS:\n",
            "Why, unere to-marrow thy sathe court his in on\n",
            "some no, God the have blay not, these wife it:\n",
            "The that hear I, thou with art, lives?\n",
            "\n",
            "LARY:\n",
            "Our while with you\n",
            "That I horrtw'd will theirs is.\n",
            "Why, I would I drue, and was father,--\n",
            "'Tensis, thy promb, many and sentry talbatt.\n",
            "\n",
            "PORDINCE:\n",
            "Why Riparding:\n",
            "In is shown's fortunds, but whom the brike our all\n"
          ]
        }
      ]
    },
    {
      "cell_type": "code",
      "source": [],
      "metadata": {
        "id": "i8lCFzYGMkBk"
      },
      "execution_count": null,
      "outputs": []
    }
  ]
}