File size: 7,126 Bytes
0685af6
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
{
 "cells": [
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "from models import etl"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "etl.main(json_path='data/single_video.json', db='data/single_video.db', batch_size=5, overlap=2)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 1,
   "metadata": {},
   "outputs": [],
   "source": [
    "import chromadb\n",
    "from models import etl"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 2,
   "metadata": {},
   "outputs": [],
   "source": [
    "client = chromadb.PersistentClient('data/single_video.db')\n",
    "collection= client.get_collection('huberman_videos')\n",
    "# collection.count()\n",
    "# collection.peek()\n",
    "\n",
    "query_text = \"What are the components of an LLM?\"\n",
    "query_embedding = etl.embed_text(query_text)\n",
    "results = collection.query(query_texts=[query_text], n_results=5)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 3,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "{'ids': [['5sLYAQS9sWQ__33',\n",
       "   '5sLYAQS9sWQ__36',\n",
       "   '5sLYAQS9sWQ__3',\n",
       "   '5sLYAQS9sWQ__6',\n",
       "   '5sLYAQS9sWQ__27']],\n",
       " 'distances': [[0.27329726119651687,\n",
       "   0.3594438065792097,\n",
       "   0.4730243492988927,\n",
       "   0.5004446084705303,\n",
       "   0.5766584257317211]],\n",
       " 'metadatas': [[{'segment_id': '5sLYAQS9sWQ__33',\n",
       "    'source': 'https://www.youtube.com/watch?v=5sLYAQS9sWQ&t=145.328s',\n",
       "    'title': 'How Large Language Models Work',\n",
       "    'video_id': '5sLYAQS9sWQ'},\n",
       "   {'segment_id': '5sLYAQS9sWQ__36',\n",
       "    'source': 'https://www.youtube.com/watch?v=5sLYAQS9sWQ&t=154.367s',\n",
       "    'title': 'How Large Language Models Work',\n",
       "    'video_id': '5sLYAQS9sWQ'},\n",
       "   {'segment_id': '5sLYAQS9sWQ__3',\n",
       "    'source': 'https://www.youtube.com/watch?v=5sLYAQS9sWQ&t=10.783s',\n",
       "    'title': 'How Large Language Models Work',\n",
       "    'video_id': '5sLYAQS9sWQ'},\n",
       "   {'segment_id': '5sLYAQS9sWQ__6',\n",
       "    'source': 'https://www.youtube.com/watch?v=5sLYAQS9sWQ&t=22.544s',\n",
       "    'title': 'How Large Language Models Work',\n",
       "    'video_id': '5sLYAQS9sWQ'},\n",
       "   {'segment_id': '5sLYAQS9sWQ__27',\n",
       "    'source': 'https://www.youtube.com/watch?v=5sLYAQS9sWQ&t=117.572s',\n",
       "    'title': 'How Large Language Models Work',\n",
       "    'video_id': '5sLYAQS9sWQ'}]],\n",
       " 'embeddings': None,\n",
       " 'documents': [['All right, so how do they work? Well, we can think of it like this. LLM equals three things: data, architecture, and lastly, we can think of it as training. Those three things are really the components of an LLM.',\n",
       "   \"data, architecture, and lastly, we can think of it as training. Those three things are really the components of an LLM. Now, we've already discussed the enormous amounts of text data that goes into these things. As for the architecture, this is a neural network and for GPT that is a transformer.\",\n",
       "   'And I\\'ve been using GPT in its various forms for years. In this video we are going to number 1, ask \"what is an LLM?\" Number 2, we are going to describe how they work. And then number 3,',\n",
       "   'Number 2, we are going to describe how they work. And then number 3, we\\'re going to ask, \"what are the business applications of LLMs?\" So let\\'s start with number 1, \"what is a large language model?\" Well, a large language model',\n",
       "   \"Yeah, that's truly a lot of text. And LLMs are also among the biggest models when it comes to parameter count. A parameter is a value the model can change independently as it learns, and the more parameters a model has, the more complex it can be. GPT-3, for example, is pre-trained on a corpus of actually 45 terabytes of data,\"]],\n",
       " 'uris': None,\n",
       " 'data': None}"
      ]
     },
     "execution_count": 3,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "results"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 10,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "CONTEXT: All right, so how do they work? Well, we can think of it like this. LLM equals three things: data, architecture, and lastly, we can think of it as training. Those three things are really the components of an LLM.\n",
      "TITLE: How Large Language Models Work\n",
      "SOURCE: https://www.youtube.com/watch?v=5sLYAQS9sWQ&t=145.328s\n",
      "\n",
      "CONTEXT: data, architecture, and lastly, we can think of it as training. Those three things are really the components of an LLM. Now, we've already discussed the enormous amounts of text data that goes into these things. As for the architecture, this is a neural network and for GPT that is a transformer.\n",
      "TITLE: How Large Language Models Work\n",
      "SOURCE: https://www.youtube.com/watch?v=5sLYAQS9sWQ&t=154.367s\n",
      "\n",
      "CONTEXT: And I've been using GPT in its various forms for years. In this video we are going to number 1, ask \"what is an LLM?\" Number 2, we are going to describe how they work. And then number 3,\n",
      "TITLE: How Large Language Models Work\n",
      "SOURCE: https://www.youtube.com/watch?v=5sLYAQS9sWQ&t=10.783s\n",
      "\n",
      "CONTEXT: Number 2, we are going to describe how they work. And then number 3, we're going to ask, \"what are the business applications of LLMs?\" So let's start with number 1, \"what is a large language model?\" Well, a large language model\n",
      "TITLE: How Large Language Models Work\n",
      "SOURCE: https://www.youtube.com/watch?v=5sLYAQS9sWQ&t=22.544s\n",
      "\n",
      "CONTEXT: Yeah, that's truly a lot of text. And LLMs are also among the biggest models when it comes to parameter count. A parameter is a value the model can change independently as it learns, and the more parameters a model has, the more complex it can be. GPT-3, for example, is pre-trained on a corpus of actually 45 terabytes of data,\n",
      "TITLE: How Large Language Models Work\n",
      "SOURCE: https://www.youtube.com/watch?v=5sLYAQS9sWQ&t=117.572s\n",
      "\n",
      "\n"
     ]
    }
   ],
   "source": [
    "from models.llm import format_context\n",
    "\n",
    "print(format_context(results))"
   ]
  }
 ],
 "metadata": {
  "kernelspec": {
   "display_name": "Python 3",
   "language": "python",
   "name": "python3"
  },
  "language_info": {
   "codemirror_mode": {
    "name": "ipython",
    "version": 3
   },
   "file_extension": ".py",
   "mimetype": "text/x-python",
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython3",
   "version": "3.11.1"
  }
 },
 "nbformat": 4,
 "nbformat_minor": 2
}