Spaces:
Runtime error
Runtime error
File size: 7,126 Bytes
0685af6 |
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 |
{
"cells": [
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"from models import etl"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"etl.main(json_path='data/single_video.json', db='data/single_video.db', batch_size=5, overlap=2)"
]
},
{
"cell_type": "code",
"execution_count": 1,
"metadata": {},
"outputs": [],
"source": [
"import chromadb\n",
"from models import etl"
]
},
{
"cell_type": "code",
"execution_count": 2,
"metadata": {},
"outputs": [],
"source": [
"client = chromadb.PersistentClient('data/single_video.db')\n",
"collection= client.get_collection('huberman_videos')\n",
"# collection.count()\n",
"# collection.peek()\n",
"\n",
"query_text = \"What are the components of an LLM?\"\n",
"query_embedding = etl.embed_text(query_text)\n",
"results = collection.query(query_texts=[query_text], n_results=5)"
]
},
{
"cell_type": "code",
"execution_count": 3,
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"{'ids': [['5sLYAQS9sWQ__33',\n",
" '5sLYAQS9sWQ__36',\n",
" '5sLYAQS9sWQ__3',\n",
" '5sLYAQS9sWQ__6',\n",
" '5sLYAQS9sWQ__27']],\n",
" 'distances': [[0.27329726119651687,\n",
" 0.3594438065792097,\n",
" 0.4730243492988927,\n",
" 0.5004446084705303,\n",
" 0.5766584257317211]],\n",
" 'metadatas': [[{'segment_id': '5sLYAQS9sWQ__33',\n",
" 'source': 'https://www.youtube.com/watch?v=5sLYAQS9sWQ&t=145.328s',\n",
" 'title': 'How Large Language Models Work',\n",
" 'video_id': '5sLYAQS9sWQ'},\n",
" {'segment_id': '5sLYAQS9sWQ__36',\n",
" 'source': 'https://www.youtube.com/watch?v=5sLYAQS9sWQ&t=154.367s',\n",
" 'title': 'How Large Language Models Work',\n",
" 'video_id': '5sLYAQS9sWQ'},\n",
" {'segment_id': '5sLYAQS9sWQ__3',\n",
" 'source': 'https://www.youtube.com/watch?v=5sLYAQS9sWQ&t=10.783s',\n",
" 'title': 'How Large Language Models Work',\n",
" 'video_id': '5sLYAQS9sWQ'},\n",
" {'segment_id': '5sLYAQS9sWQ__6',\n",
" 'source': 'https://www.youtube.com/watch?v=5sLYAQS9sWQ&t=22.544s',\n",
" 'title': 'How Large Language Models Work',\n",
" 'video_id': '5sLYAQS9sWQ'},\n",
" {'segment_id': '5sLYAQS9sWQ__27',\n",
" 'source': 'https://www.youtube.com/watch?v=5sLYAQS9sWQ&t=117.572s',\n",
" 'title': 'How Large Language Models Work',\n",
" 'video_id': '5sLYAQS9sWQ'}]],\n",
" 'embeddings': None,\n",
" 'documents': [['All right, so how do they work? Well, we can think of it like this. LLM equals three things: data, architecture, and lastly, we can think of it as training. Those three things are really the components of an LLM.',\n",
" \"data, architecture, and lastly, we can think of it as training. Those three things are really the components of an LLM. Now, we've already discussed the enormous amounts of text data that goes into these things. As for the architecture, this is a neural network and for GPT that is a transformer.\",\n",
" 'And I\\'ve been using GPT in its various forms for years. In this video we are going to number 1, ask \"what is an LLM?\" Number 2, we are going to describe how they work. And then number 3,',\n",
" 'Number 2, we are going to describe how they work. And then number 3, we\\'re going to ask, \"what are the business applications of LLMs?\" So let\\'s start with number 1, \"what is a large language model?\" Well, a large language model',\n",
" \"Yeah, that's truly a lot of text. And LLMs are also among the biggest models when it comes to parameter count. A parameter is a value the model can change independently as it learns, and the more parameters a model has, the more complex it can be. GPT-3, for example, is pre-trained on a corpus of actually 45 terabytes of data,\"]],\n",
" 'uris': None,\n",
" 'data': None}"
]
},
"execution_count": 3,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"results"
]
},
{
"cell_type": "code",
"execution_count": 10,
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"CONTEXT: All right, so how do they work? Well, we can think of it like this. LLM equals three things: data, architecture, and lastly, we can think of it as training. Those three things are really the components of an LLM.\n",
"TITLE: How Large Language Models Work\n",
"SOURCE: https://www.youtube.com/watch?v=5sLYAQS9sWQ&t=145.328s\n",
"\n",
"CONTEXT: data, architecture, and lastly, we can think of it as training. Those three things are really the components of an LLM. Now, we've already discussed the enormous amounts of text data that goes into these things. As for the architecture, this is a neural network and for GPT that is a transformer.\n",
"TITLE: How Large Language Models Work\n",
"SOURCE: https://www.youtube.com/watch?v=5sLYAQS9sWQ&t=154.367s\n",
"\n",
"CONTEXT: And I've been using GPT in its various forms for years. In this video we are going to number 1, ask \"what is an LLM?\" Number 2, we are going to describe how they work. And then number 3,\n",
"TITLE: How Large Language Models Work\n",
"SOURCE: https://www.youtube.com/watch?v=5sLYAQS9sWQ&t=10.783s\n",
"\n",
"CONTEXT: Number 2, we are going to describe how they work. And then number 3, we're going to ask, \"what are the business applications of LLMs?\" So let's start with number 1, \"what is a large language model?\" Well, a large language model\n",
"TITLE: How Large Language Models Work\n",
"SOURCE: https://www.youtube.com/watch?v=5sLYAQS9sWQ&t=22.544s\n",
"\n",
"CONTEXT: Yeah, that's truly a lot of text. And LLMs are also among the biggest models when it comes to parameter count. A parameter is a value the model can change independently as it learns, and the more parameters a model has, the more complex it can be. GPT-3, for example, is pre-trained on a corpus of actually 45 terabytes of data,\n",
"TITLE: How Large Language Models Work\n",
"SOURCE: https://www.youtube.com/watch?v=5sLYAQS9sWQ&t=117.572s\n",
"\n",
"\n"
]
}
],
"source": [
"from models.llm import format_context\n",
"\n",
"print(format_context(results))"
]
}
],
"metadata": {
"kernelspec": {
"display_name": "Python 3",
"language": "python",
"name": "python3"
},
"language_info": {
"codemirror_mode": {
"name": "ipython",
"version": 3
},
"file_extension": ".py",
"mimetype": "text/x-python",
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.11.1"
}
},
"nbformat": 4,
"nbformat_minor": 2
}
|