{ "cells": [ { "cell_type": "markdown", "id": "68e9310f-109d-4f30-b263-d1e6c058ee80", "metadata": {}, "source": [ "# Setup" ] }, { "cell_type": "code", "execution_count": 1, "id": "6805b3b5-782b-437c-82b3-9392abb5a599", "metadata": { "tags": [] }, "outputs": [], "source": [ "# %pip install -q -r requirements.txt" ] }, { "cell_type": "markdown", "id": "94f0fcdd-1653-440e-8ebc-9c33d931163a", "metadata": {}, "source": [ "## Config" ] }, { "cell_type": "code", "execution_count": 2, "id": "5d0bd22f-293e-4c15-9dfe-8070553f42b5", "metadata": { "tags": [] }, "outputs": [], "source": [ "INPUT_DATASET = 'derek-thomas/labeled-multiple-choice-explained-falcon-reasoning'\n", "REVISION = '536f3b8'\n", "OUTPUT_DATASET = 'derek-thomas/labeled-multiple-choice-explained-falcon-tokenized'" ] }, { "cell_type": "code", "execution_count": 3, "id": "a1fc7a29-6b60-446d-b708-012f897de6a9", "metadata": {}, "outputs": [ { "data": { "application/vnd.jupyter.widget-view+json": { "model_id": "e5eac788b31c41f09c2e95ef695b63b8", "version_major": 2, "version_minor": 0 }, "text/plain": [ "VBox(children=(HTML(value='
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
formatted_questioncombined_factanswer_keytopicgpt3_5_reasoningquestion_textanswer_choicesfalcon_reasoning
0what is satellite technology used for predicti...satellite technology is used for predicting wh...cTechnologya) Seconds and minutes: This option is incorre...What is satellite technology used for predicting?(a) Seconds and minutes (b) The strength and m...- (a) Seconds and minutes: Satellite technolog...
1what does irradiating food do? (a) relieve pai...irradiated food improves food safety.cFood science(a) Relieve pain: This option is not correct b...What does irradiating food do?(a) Relieve pain (b) Enhance food's nutrients ...(a) Relieve pain: Irradiating food does not ha...
2what protects a mammal's skin? (a) fiber folli...fiber follicles protect mammal skinaBiologyb) Exfoliation: Exfoliation is the process of ...What protects a mammal's skin?(a) Fiber follicles (b) Exfoliation (c) Resist...(a) **Fiber follicles**: This is the correct a...
3what do earthworms do when a segment breaks of...earthworms can regrow segments that break offbBiologya) Dies: This option is not correct because ea...What do earthworms do when a segment breaks off?(a) Dies (b) Regrows it (c) Reproduces (d) Sed...1. **Option (a): Dies**\\n - Earthworms are s...
4lightning can be bad for what? (a) the environ...lightning can be bad for the environment.aElectricityb) Rainstorms: Lightning is actually a natural...Lightning can be bad for what?(a) The environment (b) Rainstorms (c) Destruc...(a) The environment: Lightning can release lar...
...........................
8408organisms that can cause infection do what? (a...organisms that can cause infection make humans...gBiologya) Bandaging open sores is not the correct ans...Organisms that can cause infection do what?(a) Bandage open sores (b) Keep flesh clean (c...(a) Bandage open sores: This action is typical...
8409fungi are living things that cannot make thei...fungi are living things that cannot make their...aBiologyb) Fungi are living things that can make their...Fungi are living things that cannot make their...(a) Food (b) Cells (c) Energy (d) Fruits (e) H...1. **Read the question and options carefully.*...
8410an overheated body can use water for: (a) meta...the evaporation of water from the skin cools t...gBiologya) Metabolic reaction: This option is incorrec...An overheated body can use water for:?(a) Metabolic reaction (b) Dehydrating (c) Rai...- (a) Metabolic reaction: This is incorrect be...
8411what is essential for cellular respiration for...plants are essential for cellular respiration ...fBiologya) Electrons are involved in cellular respirat...What is essential for cellular respiration for...(a) Electron (b) Glucose (c) Energy (d) Energy...1. **Glucose (b)**: Glucose is one of the reac...
8412what helps insulate and protect the body? (a) ...living cells in follicles help insulate and pr...bBiologya) H2O: Water is essential for life, but it do...What helps insulate and protect the body?(a) H2o (b) Living cells in follicles (c) Laye...1. **Read the question and options carefully.*...
\n", "

8413 rows × 8 columns

\n", "" ], "text/plain": [ " formatted_question \\\n", "0 what is satellite technology used for predicti... \n", "1 what does irradiating food do? (a) relieve pai... \n", "2 what protects a mammal's skin? (a) fiber folli... \n", "3 what do earthworms do when a segment breaks of... \n", "4 lightning can be bad for what? (a) the environ... \n", "... ... \n", "8408 organisms that can cause infection do what? (a... \n", "8409 fungi are living things that cannot make thei... \n", "8410 an overheated body can use water for: (a) meta... \n", "8411 what is essential for cellular respiration for... \n", "8412 what helps insulate and protect the body? (a) ... \n", "\n", " combined_fact answer_key \\\n", "0 satellite technology is used for predicting wh... c \n", "1 irradiated food improves food safety. c \n", "2 fiber follicles protect mammal skin a \n", "3 earthworms can regrow segments that break off b \n", "4 lightning can be bad for the environment. a \n", "... ... ... \n", "8408 organisms that can cause infection make humans... g \n", "8409 fungi are living things that cannot make their... a \n", "8410 the evaporation of water from the skin cools t... g \n", "8411 plants are essential for cellular respiration ... f \n", "8412 living cells in follicles help insulate and pr... b \n", "\n", " topic gpt3_5_reasoning \\\n", "0 Technology a) Seconds and minutes: This option is incorre... \n", "1 Food science (a) Relieve pain: This option is not correct b... \n", "2 Biology b) Exfoliation: Exfoliation is the process of ... \n", "3 Biology a) Dies: This option is not correct because ea... \n", "4 Electricity b) Rainstorms: Lightning is actually a natural... \n", "... ... ... \n", "8408 Biology a) Bandaging open sores is not the correct ans... \n", "8409 Biology b) Fungi are living things that can make their... \n", "8410 Biology a) Metabolic reaction: This option is incorrec... \n", "8411 Biology a) Electrons are involved in cellular respirat... \n", "8412 Biology a) H2O: Water is essential for life, but it do... \n", "\n", " question_text \\\n", "0 What is satellite technology used for predicting? \n", "1 What does irradiating food do? \n", "2 What protects a mammal's skin? \n", "3 What do earthworms do when a segment breaks off? \n", "4 Lightning can be bad for what? \n", "... ... \n", "8408 Organisms that can cause infection do what? \n", "8409 Fungi are living things that cannot make their... \n", "8410 An overheated body can use water for:? \n", "8411 What is essential for cellular respiration for... \n", "8412 What helps insulate and protect the body? \n", "\n", " answer_choices \\\n", "0 (a) Seconds and minutes (b) The strength and m... \n", "1 (a) Relieve pain (b) Enhance food's nutrients ... \n", "2 (a) Fiber follicles (b) Exfoliation (c) Resist... \n", "3 (a) Dies (b) Regrows it (c) Reproduces (d) Sed... \n", "4 (a) The environment (b) Rainstorms (c) Destruc... \n", "... ... \n", "8408 (a) Bandage open sores (b) Keep flesh clean (c... \n", "8409 (a) Food (b) Cells (c) Energy (d) Fruits (e) H... \n", "8410 (a) Metabolic reaction (b) Dehydrating (c) Rai... \n", "8411 (a) Electron (b) Glucose (c) Energy (d) Energy... \n", "8412 (a) H2o (b) Living cells in follicles (c) Laye... \n", "\n", " falcon_reasoning \n", "0 - (a) Seconds and minutes: Satellite technolog... \n", "1 (a) Relieve pain: Irradiating food does not ha... \n", "2 (a) **Fiber follicles**: This is the correct a... \n", "3 1. **Option (a): Dies**\\n - Earthworms are s... \n", "4 (a) The environment: Lightning can release lar... \n", "... ... \n", "8408 (a) Bandage open sores: This action is typical... \n", "8409 1. **Read the question and options carefully.*... \n", "8410 - (a) Metabolic reaction: This is incorrect be... \n", "8411 1. **Glucose (b)**: Glucose is one of the reac... \n", "8412 1. **Read the question and options carefully.*... \n", "\n", "[8413 rows x 8 columns]" ] }, "execution_count": 6, "metadata": {}, "output_type": "execute_result" } ], "source": [ "# Load dataset from Hugging Face Hub\n", "dataset = load_dataset(INPUT_DATASET, split='train')\n", "\n", "# Convert to pandas dataframe\n", "df = dataset.to_pandas()\n", "print(f\"Before Cleaning: {len(df)} rows\")\n", "print(df.columns)\n", "\n", "# Drop the __index_level_0__ column if it exists\n", "df.drop(columns=['falcon_reasoning_prompt'], errors='ignore', inplace=True)\n", "\n", "# Ensure all values in 'formatted_question' are strings\n", "df.rename(columns={\n", " 'explanation': 'gpt3_5_reasoning',\n", "}, inplace=True)\n", "\n", "# Fix formatting\n", "df['question_text'] = df['question_text'].str.replace('\"', '', regex=False)\n", "df['gpt3_5_reasoning'] = df['gpt3_5_reasoning'].str.replace('\"', \"'\", regex=False)\n", "df['falcon_reasoning'] = df['falcon_reasoning'].str.replace('\"', \"'\", regex=False)\n", "\n", "df" ] }, { "cell_type": "markdown", "id": "2511bc04-f611-4dc7-b3ed-e477907b0200", "metadata": {}, "source": [ "## Create Prompts from Processed Data" ] }, { "cell_type": "markdown", "id": "d124c7cf-a369-46a9-94db-069894145959", "metadata": {}, "source": [ "We need to convert our sample into a format similar to below for each of the scenarios. This is ideal since we can use [chat templates](https://huggingface.co/docs/transformers/en/chat_templating) to easily switch models which might have different special tokens.\n", "\n", "```\n", "[\n", " {\"content\": system_prompt, \"role\": \"system\"},\n", " {\"content\": user_content, \"role\": \"user\"},\n", " {\"content\": assistant_response, \"role\": \"assistant\"}\n", "]\n", "```\n", "\n", "We should include a helpful `system_prompt` with a general trivia prefix, and a suffix that contains instructions that fit each scenario.\n", "The `user_content` will have the Question and answer choices.\n", "The `assistant_response` should reflect the scenario. " ] }, { "cell_type": "markdown", "id": "c85b3c11-18d7-4854-a0ba-ad0c1407fd6d", "metadata": {}, "source": [ "Its best to understand `template_blocks` in a couple layers. \n", "- The top layer (macro) allows me to decide which pieces I want to include. Sometimes I want just the `system`+`user` message, and for fine-tuning Ill want `system`+`user`+`assistant`\n", "- System+User:\n", " - Inside the this layer I use jinja to interpolate the values I want to add\n", " - I moved `user_content` out to get a feel for how it looks\n", "- Assistant:\n", " - Here we have an if statement to allow me to chose between FA, RFA and FAR\n", " - Inside that we just have the same interploation as seen elsewhere\n", "\n", "You can see in `initial` and `full` the json for the messages structure. Here Im selecting which macros I want to use." ] }, { "cell_type": "code", "execution_count": 7, "id": "f42f3c34-f736-4e1c-b904-418caf2b0de1", "metadata": {}, "outputs": [], "source": [ "from jinja2 import Environment, DictLoader\n", "\n", "template_blocks = '''\n", "{%- macro user_message(system_content, question_text, answer_choices) -%}\n", "{\n", " \"role\": \"system\",\n", " \"content\": {{ system_content }}\n", "},\n", "{\n", " \"role\": \"user\",\n", " \"content\": \"Question: {{ question_text }}\\\\nAnswer Choices: {{ answer_choices }}\"\n", "\n", "}\n", "{%- endmacro %}\n", "\n", "{% macro assistant_response(reasoning, answer_key, response_order='default') -%}\n", "{\n", " \"role\": \"assistant\",\n", " \"content\": {\n", " {% if response_order == 'rfa' -%}\n", " \"reasoning\": {{ reasoning | tojson }},\n", " \"final_answer\": \"{{ answer_key }}\"\n", " {% elif response_order == 'far' -%}\n", " \"final_answer\": \"{{ answer_key }}\",\n", " \"reasoning\": {{ reasoning | tojson }}\n", " {% else -%}\n", " \"final_answer\": \"{{ answer_key }}\"\n", " {% endif %}\n", " }\n", "}\n", "{%- endmacro %}\n", "'''\n", "\n", "# System + User only (initial template)\n", "initial = '''\n", "[\n", " {{ user_message(system_content, question_text, answer_choices) }}\n", "]\n", "'''\n", "\n", "# Full conversation template\n", "full = '''\n", "[\n", " {{ user_message(system_content, question_text, answer_choices) }},\n", " {{ assistant_response(reasoning, answer_key, response_order) }}\n", "]\n", "'''\n", "\n", "# Create Jinja environment and load templates\n", "env = Environment(loader=DictLoader({\n", " 'template_blocks': template_blocks,\n", " 'initial': initial,\n", " 'full': full\n", "}))\n", "\n", "# # Load the macro definitions into the environment\n", "macro_template = env.get_template('template_blocks')\n", "env.globals.update(macro_template.module.__dict__)\n", "\n", "# Compile full and initial templates\n", "full_template = env.get_template('full')\n", "initial_template = env.get_template('initial')" ] }, { "cell_type": "markdown", "id": "35b2b21b-6ecf-490d-a453-7d679e3b1877", "metadata": {}, "source": [ "### Reasoning Final Answer" ] }, { "cell_type": "code", "execution_count": 8, "id": "eccb2f71-70a9-41fc-8235-d58b8876bdf1", "metadata": {}, "outputs": [], "source": [ "rfa_system_content = 'Answer the Question and include your reasoning and the final answer in a json like: {\"reasoning\": , \"final_answer\": }.'\n", "rfa_system_content = json.dumps(rfa_system_content)\n", "\n", "# USER Prompt\n", "df['user_prompt_RFA'] = df.apply(lambda row: initial_template.render(\n", " system_content=rfa_system_content,\n", " question_text=row['question_text'],\n", " answer_choices=row['answer_choices']\n", "), axis=1)\n", "df['user_prompt_RFA'] = df['user_prompt_RFA'].apply(json.loads)" ] }, { "cell_type": "markdown", "id": "778538da-9290-4815-b792-6c632f3d398f", "metadata": {}, "source": [ "#### RFA ChatGPT 3.5 Example" ] }, { "cell_type": "code", "execution_count": 9, "id": "bb6bf32e-9d2c-40a4-a10c-c1c3df16bf1f", "metadata": { "scrolled": true }, "outputs": [], "source": [ "def generate_full_conversation(row, reasoning_key):\n", " rfa_template_input = {\n", " 'system_content': rfa_system_content,\n", " 'question_text': row['question_text'],\n", " 'answer_choices': row['answer_choices'],\n", " 'answer_key': row['answer_key'],\n", " 'response_order': 'rfa'\n", " }\n", " return full_template.render(**rfa_template_input, reasoning=row[reasoning_key])\n", "\n", "# Full Conversation GPT3.5\n", "df['conversation_RFA_gpt3_5'] = df.apply(lambda row: generate_full_conversation(row, 'gpt3_5_reasoning'), axis=1)\n", "df['conversation_RFA_gpt3_5'] = df['conversation_RFA_gpt3_5'].apply(json.loads)\n", "\n", "# Full Conversation Falcon\n", "df['conversation_RFA_falcon'] = df.apply(lambda row: generate_full_conversation(row, 'falcon_reasoning'), axis=1)\n", "df['conversation_RFA_falcon'] = df['conversation_RFA_falcon'].apply(json.loads)" ] }, { "cell_type": "code", "execution_count": 10, "id": "c22bae14-d5c2-4ed0-8d52-f98d6a4f24b2", "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "Answer the Question and include your reasoning and the final answer in a json like: {\"reasoning\": , \"final_answer\": }.\n", "---\n", "Question: What is satellite technology used for predicting?\n", "Answer Choices: (a) Seconds and minutes (b) The strength and magnitude of an earthquake (c) What it's like outside each day (d) 70-75 degrees fahrenheit (e) Rapid changes occur (f) Dead-ends and false starts. (g) Snow, ice, and rock (h) Around 5 to 27 degrees celsius\n", "---\n", "dict_keys(['reasoning', 'final_answer'])\n", "---\n", "a) Seconds and minutes: This option is incorrect because satellite technology is not used for predicting time intervals. Satellite technology is used for various purposes such as communication, navigation, and weather forecasting, but it is not used for predicting time intervals.\n", "\n", "b) The strength and magnitude of an earthquake: This option is incorrect because satellite technology is not used for predicting earthquakes. Earthquake prediction is a complex process that involves seismology and other scientific methods, but satellite technology is not one of them.\n", "\n", "d) 70-75 degrees Fahrenheit: This option is incorrect because satellite technology is not used for predicting specific temperature ranges. While satellite technology can provide temperature data, it is not used for predicting specific temperature ranges.\n", "\n", "e) Rapid changes occur: This option is too vague and does not provide enough information to determine whether it is correct or not. Satellite technology can be used to monitor changes in various environmental factors, but it is not used specifically for predicting rapid changes.\n", "\n", "f) Dead-ends and false starts: This option is incorrect because it is not related to satellite technology or any type of prediction.\n", "\n", "g) Snow, ice, and rock: This option is incorrect because it is too specific and does not cover the broad range of predictions that satellite technology can be used for. While satellite technology can be used to monitor snow, ice, and rock formations, it is not used exclusively for this purpose.\n", "\n", "h) Around 5 to 27 degrees Celsius: This option is incorrect because it is too specific and does not cover the broad range of temperature predictions that satellite technology can be used for. While satellite technology can provide temperature data, it is not used exclusively for predicting temperatures within a specific range.\n", "\n", "Therefore, the correct answer is c) what it's like outside each day, as satellite technology is commonly used for weather forecasting and predicting daily weather conditions.\n" ] } ], "source": [ "rfa_test_row = df.conversation_RFA_gpt3_5.iloc[0]\n", "print(rfa_test_row[0]['content'])\n", "print('---')\n", "print(rfa_test_row[1]['content'])\n", "print('---')\n", "print(rfa_test_row[2]['content'].keys())\n", "print('---')\n", "print(rfa_test_row[2]['content']['reasoning'])" ] }, { "cell_type": "markdown", "id": "5f1ca469-473f-423d-b673-a7e7278f9bbb", "metadata": {}, "source": [ "#### RFA Falcon Example" ] }, { "cell_type": "code", "execution_count": 11, "id": "f6b86fc1-beb0-40c8-9293-0824b7926b7b", "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "Answer the Question and include your reasoning and the final answer in a json like: {\"reasoning\": , \"final_answer\": }.\n", "---\n", "Question: What is satellite technology used for predicting?\n", "Answer Choices: (a) Seconds and minutes (b) The strength and magnitude of an earthquake (c) What it's like outside each day (d) 70-75 degrees fahrenheit (e) Rapid changes occur (f) Dead-ends and false starts. (g) Snow, ice, and rock (h) Around 5 to 27 degrees celsius\n", "---\n", "dict_keys(['reasoning', 'final_answer'])\n", "---\n", "- (a) Seconds and minutes: Satellite technology is not used to predict seconds and minutes. This is too specific and not what satellite technology is generally used for. Satellite technology is used for broader time scales, such as days, weeks, months, and years.\n", "- (b) The strength and magnitude of an earthquake: While some types of satellite data can be used in conjunction with other information to study seismicity, earthquakes themselves are typically predicted using other methods, like seismographs and geological studies.\n", "- (d) 70-75 degrees fahrenheit: This is a specific temperature range, and satellites cannot predict exact temperature ranges like this. While satellites do play a role in weather prediction, which includes temperature, predicting specific ranges within a day isn't a typical application.\n", "- (e) Rapid changes occur: While satellites can detect rapid changes in many areas, such as cloud cover, weather fronts, or volcanic activity, the phrase 'rapid changes occur' is too vague. It could apply to various phenomena, so it's not specifically about predictions related to satellite technology alone.\n", "- (f) Dead-ends and false starts: These descriptions relate to human behavior and decision-making processes, not predictions associated with satellite technology.\n", "- (g) Snow, ice, and rock: This is a very specific combination of weather and geological features. While satellite technology can certainly be used to monitor snow, ice, and rock & soil changes (for landslide detection, for example), it's not specifically about predicting these phenomena in the way the question suggests.\n", "- (h) Around 5 to 27 degrees celsius: Similar to (d), this is a specific temperature range, and satellites alone cannot predict specific temperature ranges so precisely.\n", "\n", "The correct answer, (c) What it's like outside each day, touches on a broader range of environmental conditions, which satellite technology can provide data on. Satellites track weather patterns, cloud cover, sunlight, and other indicators that give us an idea of the outside conditions almost daily. \n", "\n", "This broad, daily overview of environmental conditions is something that satellites can provide, unlike the more narrow and specific predictions mentioned in the other options. Although it's not a technical 'prediction' in the sense of forecasting exact events all the time, it does provide up-to-date and current information about what it's like outside, which can be used to make informed decisions and inform other predictions related to weather and environment.\n" ] } ], "source": [ "rfa_test_row = df.conversation_RFA_falcon.iloc[0]\n", "print(rfa_test_row[0]['content'])\n", "print('---')\n", "print(rfa_test_row[1]['content'])\n", "print('---')\n", "print(rfa_test_row[2]['content'].keys())\n", "print('---')\n", "print(rfa_test_row[2]['content']['reasoning'])" ] }, { "cell_type": "markdown", "id": "e475bc17-03b4-45b7-9188-f60110325eff", "metadata": {}, "source": [ "At this point we should feel pretty comfortable with our prompt, lets repeat this for `FAR` and `FA`." ] }, { "cell_type": "markdown", "id": "a1150ae9-cc22-44c6-8a36-0e3d7c16111b", "metadata": {}, "source": [ "### Final Answer Reasoning Structured Generation" ] }, { "cell_type": "code", "execution_count": 12, "id": "e5437ce4-f6be-4b0e-88e8-d87f8256945e", "metadata": {}, "outputs": [], "source": [ "far_system_content = 'Answer the Question and include your Final Answer and the Reasoning in a json like: {\"final_answer\": , \"reasoning\": }.'\n", "far_system_content = json.dumps(far_system_content)\n", "\n", "# USER Prompt\n", "df['user_prompt_FAR'] = df.apply(lambda row: initial_template.render(\n", " system_content=far_system_content,\n", " question_text=row['question_text'],\n", " answer_choices=row['answer_choices']\n", "), axis=1)\n", "df['user_prompt_FAR'] = df['user_prompt_FAR'].apply(json.loads)" ] }, { "cell_type": "code", "execution_count": 13, "id": "f3393931-e8d8-48d7-8734-a6a8f6afc032", "metadata": { "scrolled": true }, "outputs": [], "source": [ "def generate_full_conversation(row, reasoning_key):\n", " far_template_input = {\n", " 'system_content': far_system_content,\n", " 'question_text': row['question_text'],\n", " 'answer_choices': row['answer_choices'],\n", " 'answer_key': row['answer_key'],\n", " 'response_order': 'far'\n", " }\n", " return full_template.render(**far_template_input, reasoning=row[reasoning_key])\n", "\n", "# Full Conversation GPT3.5\n", "df['conversation_FAR_gpt3_5'] = df.apply(lambda row: generate_full_conversation(row, 'gpt3_5_reasoning'), axis=1)\n", "df['conversation_FAR_gpt3_5'] = df['conversation_FAR_gpt3_5'].apply(json.loads)\n", "\n", "# Full Conversation Falcon\n", "df['conversation_FAR_falcon'] = df.apply(lambda row: generate_full_conversation(row, 'falcon_reasoning'), axis=1)\n", "df['conversation_FAR_falcon'] = df['conversation_FAR_falcon'].apply(json.loads)" ] }, { "cell_type": "markdown", "id": "ef90916f-c37e-4684-9ce6-f89e08158403", "metadata": {}, "source": [ "#### FAR ChatGPT 3.5 Example" ] }, { "cell_type": "code", "execution_count": 14, "id": "24dcc34d-6c78-42bb-b24f-ee715f27f405", "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "Answer the Question and include your Final Answer and the Reasoning in a json like: {\"final_answer\": , \"reasoning\": }.\n", "---\n", "Question: What is satellite technology used for predicting?\n", "Answer Choices: (a) Seconds and minutes (b) The strength and magnitude of an earthquake (c) What it's like outside each day (d) 70-75 degrees fahrenheit (e) Rapid changes occur (f) Dead-ends and false starts. (g) Snow, ice, and rock (h) Around 5 to 27 degrees celsius\n", "---\n", "dict_keys(['final_answer', 'reasoning'])\n", "---\n", "a) Seconds and minutes: This option is incorrect because satellite technology is not used for predicting time intervals. Satellite technology is used for various purposes such as communication, navigation, and weather forecasting, but it is not used for predicting time intervals.\n", "\n", "b) The strength and magnitude of an earthquake: This option is incorrect because satellite technology is not used for predicting earthquakes. Earthquake prediction is a complex process that involves seismology and other scientific methods, but satellite technology is not one of them.\n", "\n", "d) 70-75 degrees Fahrenheit: This option is incorrect because satellite technology is not used for predicting specific temperature ranges. While satellite technology can provide temperature data, it is not used for predicting specific temperature ranges.\n", "\n", "e) Rapid changes occur: This option is too vague and does not provide enough information to determine whether it is correct or not. Satellite technology can be used to monitor changes in various environmental factors, but it is not used specifically for predicting rapid changes.\n", "\n", "f) Dead-ends and false starts: This option is incorrect because it is not related to satellite technology or any type of prediction.\n", "\n", "g) Snow, ice, and rock: This option is incorrect because it is too specific and does not cover the broad range of predictions that satellite technology can be used for. While satellite technology can be used to monitor snow, ice, and rock formations, it is not used exclusively for this purpose.\n", "\n", "h) Around 5 to 27 degrees Celsius: This option is incorrect because it is too specific and does not cover the broad range of temperature predictions that satellite technology can be used for. While satellite technology can provide temperature data, it is not used exclusively for predicting temperatures within a specific range.\n", "\n", "Therefore, the correct answer is c) what it's like outside each day, as satellite technology is commonly used for weather forecasting and predicting daily weather conditions.\n" ] } ], "source": [ "far_test_row = df.conversation_FAR_gpt3_5.iloc[0]\n", "print(far_test_row[0]['content'])\n", "print('---')\n", "print(far_test_row[1]['content'])\n", "print('---')\n", "print(far_test_row[2]['content'].keys())\n", "print('---')\n", "print(far_test_row[2]['content']['reasoning'])" ] }, { "cell_type": "markdown", "id": "f76c6fd3-149e-4db1-b333-8e2c451286cb", "metadata": {}, "source": [ "#### FAR Falcon Example" ] }, { "cell_type": "code", "execution_count": 15, "id": "c28f8e7e-bce0-4855-9d25-42a1f5e9b0cd", "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "Answer the Question and include your Final Answer and the Reasoning in a json like: {\"final_answer\": , \"reasoning\": }.\n", "---\n", "Question: What is satellite technology used for predicting?\n", "Answer Choices: (a) Seconds and minutes (b) The strength and magnitude of an earthquake (c) What it's like outside each day (d) 70-75 degrees fahrenheit (e) Rapid changes occur (f) Dead-ends and false starts. (g) Snow, ice, and rock (h) Around 5 to 27 degrees celsius\n", "---\n", "dict_keys(['final_answer', 'reasoning'])\n", "---\n", "- (a) Seconds and minutes: Satellite technology is not used to predict seconds and minutes. This is too specific and not what satellite technology is generally used for. Satellite technology is used for broader time scales, such as days, weeks, months, and years.\n", "- (b) The strength and magnitude of an earthquake: While some types of satellite data can be used in conjunction with other information to study seismicity, earthquakes themselves are typically predicted using other methods, like seismographs and geological studies.\n", "- (d) 70-75 degrees fahrenheit: This is a specific temperature range, and satellites cannot predict exact temperature ranges like this. While satellites do play a role in weather prediction, which includes temperature, predicting specific ranges within a day isn't a typical application.\n", "- (e) Rapid changes occur: While satellites can detect rapid changes in many areas, such as cloud cover, weather fronts, or volcanic activity, the phrase 'rapid changes occur' is too vague. It could apply to various phenomena, so it's not specifically about predictions related to satellite technology alone.\n", "- (f) Dead-ends and false starts: These descriptions relate to human behavior and decision-making processes, not predictions associated with satellite technology.\n", "- (g) Snow, ice, and rock: This is a very specific combination of weather and geological features. While satellite technology can certainly be used to monitor snow, ice, and rock & soil changes (for landslide detection, for example), it's not specifically about predicting these phenomena in the way the question suggests.\n", "- (h) Around 5 to 27 degrees celsius: Similar to (d), this is a specific temperature range, and satellites alone cannot predict specific temperature ranges so precisely.\n", "\n", "The correct answer, (c) What it's like outside each day, touches on a broader range of environmental conditions, which satellite technology can provide data on. Satellites track weather patterns, cloud cover, sunlight, and other indicators that give us an idea of the outside conditions almost daily. \n", "\n", "This broad, daily overview of environmental conditions is something that satellites can provide, unlike the more narrow and specific predictions mentioned in the other options. Although it's not a technical 'prediction' in the sense of forecasting exact events all the time, it does provide up-to-date and current information about what it's like outside, which can be used to make informed decisions and inform other predictions related to weather and environment.\n" ] } ], "source": [ "far_test_row = df.conversation_FAR_falcon.iloc[0]\n", "print(far_test_row[0]['content'])\n", "print('---')\n", "print(far_test_row[1]['content'])\n", "print('---')\n", "print(far_test_row[2]['content'].keys())\n", "print('---')\n", "print(far_test_row[2]['content']['reasoning'])" ] }, { "cell_type": "markdown", "id": "2c9f0619-1467-4b4e-ad69-b5ca2c4e58a7", "metadata": {}, "source": [ "### Final Answer Structured Generation" ] }, { "cell_type": "code", "execution_count": 16, "id": "5a48d4a6-66bd-4941-9227-720fb7cde805", "metadata": {}, "outputs": [], "source": [ "fa_system_content = 'Answer the Question and include your Final Answer in a json like: {\"final_answer\": }.'\n", "fa_system_content = json.dumps(fa_system_content)\n", "\n", "# USER Prompt\n", "df['user_prompt_FA'] = df.apply(lambda row: initial_template.render(\n", " system_content=fa_system_content,\n", " question_text=row['question_text'],\n", " answer_choices=row['answer_choices']\n", "), axis=1)\n", "df['user_prompt_FA'] = df['user_prompt_FA'].apply(json.loads)" ] }, { "cell_type": "code", "execution_count": 17, "id": "23498249-1bee-424c-a0a2-db282c4e60b5", "metadata": { "scrolled": true }, "outputs": [], "source": [ "def generate_full_conversation(row):\n", " fa_template_input = {\n", " 'system_content': fa_system_content,\n", " 'question_text': row['question_text'],\n", " 'answer_choices': row['answer_choices'],\n", " 'answer_key': row['answer_key'],\n", " 'response_order': 'fa'\n", " }\n", " return full_template.render(**fa_template_input)\n", "\n", "# Full Conversation GPT3.5\n", "df['conversation_FA'] = df.apply(lambda row: generate_full_conversation(row), axis=1)\n", "df['conversation_FA'] = df['conversation_FA'].apply(json.loads)\n" ] }, { "cell_type": "markdown", "id": "0afbe045-4361-4f92-92a0-243029151a43", "metadata": {}, "source": [ "#### FA Example" ] }, { "cell_type": "code", "execution_count": 18, "id": "9b37bcf3-0a6a-41b4-934f-932687181947", "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "Answer the Question and include your Final Answer in a json like: {\"final_answer\": }.\n", "---\n", "Question: What is satellite technology used for predicting?\n", "Answer Choices: (a) Seconds and minutes (b) The strength and magnitude of an earthquake (c) What it's like outside each day (d) 70-75 degrees fahrenheit (e) Rapid changes occur (f) Dead-ends and false starts. (g) Snow, ice, and rock (h) Around 5 to 27 degrees celsius\n", "---\n", "{'final_answer': 'c'}\n" ] } ], "source": [ "fa_test_row = df.conversation_FA.iloc[0]\n", "print(fa_test_row[0]['content'])\n", "print('---')\n", "print(fa_test_row[1]['content'])\n", "print('---')\n", "print(fa_test_row[2]['content'])" ] }, { "cell_type": "markdown", "id": "ad51b99b-cf67-43d8-833c-5ec57d9d4613", "metadata": {}, "source": [ "### Cleanup" ] }, { "cell_type": "code", "execution_count": 19, "id": "69a687d5-35ab-4abb-8bb8-f975fa7be3f7", "metadata": { "tags": [] }, "outputs": [ { "data": { "text/plain": [ "Index(['formatted_question', 'combined_fact', 'answer_key', 'topic',\n", " 'gpt3_5_reasoning', 'question_text', 'answer_choices',\n", " 'falcon_reasoning', 'user_prompt_RFA', 'conversation_RFA_gpt3_5',\n", " 'conversation_RFA_falcon', 'user_prompt_FAR', 'conversation_FAR_gpt3_5',\n", " 'conversation_FAR_falcon', 'user_prompt_FA', 'conversation_FA'],\n", " dtype='object')" ] }, "execution_count": 19, "metadata": {}, "output_type": "execute_result" } ], "source": [ "df.columns" ] }, { "cell_type": "code", "execution_count": 20, "id": "6ec3b6b5-a359-4da8-98c8-4dae75b8a2d4", "metadata": { "tags": [] }, "outputs": [], "source": [ "df = df[['topic', 'question_text', 'answer_key', 'gpt3_5_reasoning', 'falcon_reasoning', 'answer_choices',\n", " 'user_prompt_RFA', 'conversation_RFA_gpt3_5', 'conversation_RFA_falcon',\n", " 'user_prompt_FAR', 'conversation_FAR_gpt3_5', 'conversation_FAR_falcon',\n", " 'user_prompt_FA', 'conversation_FA']]" ] }, { "cell_type": "code", "execution_count": 21, "id": "f9f3a5a8-e4ea-4ed9-9fc0-fdd1a19b1c92", "metadata": { "tags": [] }, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
topicquestion_textanswer_keygpt3_5_reasoningfalcon_reasoninganswer_choicesuser_prompt_RFAconversation_RFA_gpt3_5conversation_RFA_falconuser_prompt_FARconversation_FAR_gpt3_5conversation_FAR_falconuser_prompt_FAconversation_FA
0TechnologyWhat is satellite technology used for predicting?ca) Seconds and minutes: This option is incorre...- (a) Seconds and minutes: Satellite technolog...(a) Seconds and minutes (b) The strength and m...[{'role': 'system', 'content': 'Answer the Que...[{'role': 'system', 'content': 'Answer the Que...[{'role': 'system', 'content': 'Answer the Que...[{'role': 'system', 'content': 'Answer the Que...[{'role': 'system', 'content': 'Answer the Que...[{'role': 'system', 'content': 'Answer the Que...[{'role': 'system', 'content': 'Answer the Que...[{'role': 'system', 'content': 'Answer the Que...
1Food scienceWhat does irradiating food do?c(a) Relieve pain: This option is not correct b...(a) Relieve pain: Irradiating food does not ha...(a) Relieve pain (b) Enhance food's nutrients ...[{'role': 'system', 'content': 'Answer the Que...[{'role': 'system', 'content': 'Answer the Que...[{'role': 'system', 'content': 'Answer the Que...[{'role': 'system', 'content': 'Answer the Que...[{'role': 'system', 'content': 'Answer the Que...[{'role': 'system', 'content': 'Answer the Que...[{'role': 'system', 'content': 'Answer the Que...[{'role': 'system', 'content': 'Answer the Que...
2BiologyWhat protects a mammal's skin?ab) Exfoliation: Exfoliation is the process of ...(a) **Fiber follicles**: This is the correct a...(a) Fiber follicles (b) Exfoliation (c) Resist...[{'role': 'system', 'content': 'Answer the Que...[{'role': 'system', 'content': 'Answer the Que...[{'role': 'system', 'content': 'Answer the Que...[{'role': 'system', 'content': 'Answer the Que...[{'role': 'system', 'content': 'Answer the Que...[{'role': 'system', 'content': 'Answer the Que...[{'role': 'system', 'content': 'Answer the Que...[{'role': 'system', 'content': 'Answer the Que...
3BiologyWhat do earthworms do when a segment breaks off?ba) Dies: This option is not correct because ea...1. **Option (a): Dies**\\n - Earthworms are s...(a) Dies (b) Regrows it (c) Reproduces (d) Sed...[{'role': 'system', 'content': 'Answer the Que...[{'role': 'system', 'content': 'Answer the Que...[{'role': 'system', 'content': 'Answer the Que...[{'role': 'system', 'content': 'Answer the Que...[{'role': 'system', 'content': 'Answer the Que...[{'role': 'system', 'content': 'Answer the Que...[{'role': 'system', 'content': 'Answer the Que...[{'role': 'system', 'content': 'Answer the Que...
4ElectricityLightning can be bad for what?ab) Rainstorms: Lightning is actually a natural...(a) The environment: Lightning can release lar...(a) The environment (b) Rainstorms (c) Destruc...[{'role': 'system', 'content': 'Answer the Que...[{'role': 'system', 'content': 'Answer the Que...[{'role': 'system', 'content': 'Answer the Que...[{'role': 'system', 'content': 'Answer the Que...[{'role': 'system', 'content': 'Answer the Que...[{'role': 'system', 'content': 'Answer the Que...[{'role': 'system', 'content': 'Answer the Que...[{'role': 'system', 'content': 'Answer the Que...
.............................................
8408BiologyOrganisms that can cause infection do what?ga) Bandaging open sores is not the correct ans...(a) Bandage open sores: This action is typical...(a) Bandage open sores (b) Keep flesh clean (c...[{'role': 'system', 'content': 'Answer the Que...[{'role': 'system', 'content': 'Answer the Que...[{'role': 'system', 'content': 'Answer the Que...[{'role': 'system', 'content': 'Answer the Que...[{'role': 'system', 'content': 'Answer the Que...[{'role': 'system', 'content': 'Answer the Que...[{'role': 'system', 'content': 'Answer the Que...[{'role': 'system', 'content': 'Answer the Que...
8409BiologyFungi are living things that cannot make their...ab) Fungi are living things that can make their...1. **Read the question and options carefully.*...(a) Food (b) Cells (c) Energy (d) Fruits (e) H...[{'role': 'system', 'content': 'Answer the Que...[{'role': 'system', 'content': 'Answer the Que...[{'role': 'system', 'content': 'Answer the Que...[{'role': 'system', 'content': 'Answer the Que...[{'role': 'system', 'content': 'Answer the Que...[{'role': 'system', 'content': 'Answer the Que...[{'role': 'system', 'content': 'Answer the Que...[{'role': 'system', 'content': 'Answer the Que...
8410BiologyAn overheated body can use water for:?ga) Metabolic reaction: This option is incorrec...- (a) Metabolic reaction: This is incorrect be...(a) Metabolic reaction (b) Dehydrating (c) Rai...[{'role': 'system', 'content': 'Answer the Que...[{'role': 'system', 'content': 'Answer the Que...[{'role': 'system', 'content': 'Answer the Que...[{'role': 'system', 'content': 'Answer the Que...[{'role': 'system', 'content': 'Answer the Que...[{'role': 'system', 'content': 'Answer the Que...[{'role': 'system', 'content': 'Answer the Que...[{'role': 'system', 'content': 'Answer the Que...
8411BiologyWhat is essential for cellular respiration for...fa) Electrons are involved in cellular respirat...1. **Glucose (b)**: Glucose is one of the reac...(a) Electron (b) Glucose (c) Energy (d) Energy...[{'role': 'system', 'content': 'Answer the Que...[{'role': 'system', 'content': 'Answer the Que...[{'role': 'system', 'content': 'Answer the Que...[{'role': 'system', 'content': 'Answer the Que...[{'role': 'system', 'content': 'Answer the Que...[{'role': 'system', 'content': 'Answer the Que...[{'role': 'system', 'content': 'Answer the Que...[{'role': 'system', 'content': 'Answer the Que...
8412BiologyWhat helps insulate and protect the body?ba) H2O: Water is essential for life, but it do...1. **Read the question and options carefully.*...(a) H2o (b) Living cells in follicles (c) Laye...[{'role': 'system', 'content': 'Answer the Que...[{'role': 'system', 'content': 'Answer the Que...[{'role': 'system', 'content': 'Answer the Que...[{'role': 'system', 'content': 'Answer the Que...[{'role': 'system', 'content': 'Answer the Que...[{'role': 'system', 'content': 'Answer the Que...[{'role': 'system', 'content': 'Answer the Que...[{'role': 'system', 'content': 'Answer the Que...
\n", "

8413 rows × 14 columns

\n", "
" ], "text/plain": [ " topic question_text \\\n", "0 Technology What is satellite technology used for predicting? \n", "1 Food science What does irradiating food do? \n", "2 Biology What protects a mammal's skin? \n", "3 Biology What do earthworms do when a segment breaks off? \n", "4 Electricity Lightning can be bad for what? \n", "... ... ... \n", "8408 Biology Organisms that can cause infection do what? \n", "8409 Biology Fungi are living things that cannot make their... \n", "8410 Biology An overheated body can use water for:? \n", "8411 Biology What is essential for cellular respiration for... \n", "8412 Biology What helps insulate and protect the body? \n", "\n", " answer_key gpt3_5_reasoning \\\n", "0 c a) Seconds and minutes: This option is incorre... \n", "1 c (a) Relieve pain: This option is not correct b... \n", "2 a b) Exfoliation: Exfoliation is the process of ... \n", "3 b a) Dies: This option is not correct because ea... \n", "4 a b) Rainstorms: Lightning is actually a natural... \n", "... ... ... \n", "8408 g a) Bandaging open sores is not the correct ans... \n", "8409 a b) Fungi are living things that can make their... \n", "8410 g a) Metabolic reaction: This option is incorrec... \n", "8411 f a) Electrons are involved in cellular respirat... \n", "8412 b a) H2O: Water is essential for life, but it do... \n", "\n", " falcon_reasoning \\\n", "0 - (a) Seconds and minutes: Satellite technolog... \n", "1 (a) Relieve pain: Irradiating food does not ha... \n", "2 (a) **Fiber follicles**: This is the correct a... \n", "3 1. **Option (a): Dies**\\n - Earthworms are s... \n", "4 (a) The environment: Lightning can release lar... \n", "... ... \n", "8408 (a) Bandage open sores: This action is typical... \n", "8409 1. **Read the question and options carefully.*... \n", "8410 - (a) Metabolic reaction: This is incorrect be... \n", "8411 1. **Glucose (b)**: Glucose is one of the reac... \n", "8412 1. **Read the question and options carefully.*... \n", "\n", " answer_choices \\\n", "0 (a) Seconds and minutes (b) The strength and m... \n", "1 (a) Relieve pain (b) Enhance food's nutrients ... \n", "2 (a) Fiber follicles (b) Exfoliation (c) Resist... \n", "3 (a) Dies (b) Regrows it (c) Reproduces (d) Sed... \n", "4 (a) The environment (b) Rainstorms (c) Destruc... \n", "... ... \n", "8408 (a) Bandage open sores (b) Keep flesh clean (c... \n", "8409 (a) Food (b) Cells (c) Energy (d) Fruits (e) H... \n", "8410 (a) Metabolic reaction (b) Dehydrating (c) Rai... \n", "8411 (a) Electron (b) Glucose (c) Energy (d) Energy... \n", "8412 (a) H2o (b) Living cells in follicles (c) Laye... \n", "\n", " user_prompt_RFA \\\n", "0 [{'role': 'system', 'content': 'Answer the Que... \n", "1 [{'role': 'system', 'content': 'Answer the Que... \n", "2 [{'role': 'system', 'content': 'Answer the Que... \n", "3 [{'role': 'system', 'content': 'Answer the Que... \n", "4 [{'role': 'system', 'content': 'Answer the Que... \n", "... ... \n", "8408 [{'role': 'system', 'content': 'Answer the Que... \n", "8409 [{'role': 'system', 'content': 'Answer the Que... \n", "8410 [{'role': 'system', 'content': 'Answer the Que... \n", "8411 [{'role': 'system', 'content': 'Answer the Que... \n", "8412 [{'role': 'system', 'content': 'Answer the Que... \n", "\n", " conversation_RFA_gpt3_5 \\\n", "0 [{'role': 'system', 'content': 'Answer the Que... \n", "1 [{'role': 'system', 'content': 'Answer the Que... \n", "2 [{'role': 'system', 'content': 'Answer the Que... \n", "3 [{'role': 'system', 'content': 'Answer the Que... \n", "4 [{'role': 'system', 'content': 'Answer the Que... \n", "... ... \n", "8408 [{'role': 'system', 'content': 'Answer the Que... \n", "8409 [{'role': 'system', 'content': 'Answer the Que... \n", "8410 [{'role': 'system', 'content': 'Answer the Que... \n", "8411 [{'role': 'system', 'content': 'Answer the Que... \n", "8412 [{'role': 'system', 'content': 'Answer the Que... \n", "\n", " conversation_RFA_falcon \\\n", "0 [{'role': 'system', 'content': 'Answer the Que... \n", "1 [{'role': 'system', 'content': 'Answer the Que... \n", "2 [{'role': 'system', 'content': 'Answer the Que... \n", "3 [{'role': 'system', 'content': 'Answer the Que... \n", "4 [{'role': 'system', 'content': 'Answer the Que... \n", "... ... \n", "8408 [{'role': 'system', 'content': 'Answer the Que... \n", "8409 [{'role': 'system', 'content': 'Answer the Que... \n", "8410 [{'role': 'system', 'content': 'Answer the Que... \n", "8411 [{'role': 'system', 'content': 'Answer the Que... \n", "8412 [{'role': 'system', 'content': 'Answer the Que... \n", "\n", " user_prompt_FAR \\\n", "0 [{'role': 'system', 'content': 'Answer the Que... \n", "1 [{'role': 'system', 'content': 'Answer the Que... \n", "2 [{'role': 'system', 'content': 'Answer the Que... \n", "3 [{'role': 'system', 'content': 'Answer the Que... \n", "4 [{'role': 'system', 'content': 'Answer the Que... \n", "... ... \n", "8408 [{'role': 'system', 'content': 'Answer the Que... \n", "8409 [{'role': 'system', 'content': 'Answer the Que... \n", "8410 [{'role': 'system', 'content': 'Answer the Que... \n", "8411 [{'role': 'system', 'content': 'Answer the Que... \n", "8412 [{'role': 'system', 'content': 'Answer the Que... \n", "\n", " conversation_FAR_gpt3_5 \\\n", "0 [{'role': 'system', 'content': 'Answer the Que... \n", "1 [{'role': 'system', 'content': 'Answer the Que... \n", "2 [{'role': 'system', 'content': 'Answer the Que... \n", "3 [{'role': 'system', 'content': 'Answer the Que... \n", "4 [{'role': 'system', 'content': 'Answer the Que... \n", "... ... \n", "8408 [{'role': 'system', 'content': 'Answer the Que... \n", "8409 [{'role': 'system', 'content': 'Answer the Que... \n", "8410 [{'role': 'system', 'content': 'Answer the Que... \n", "8411 [{'role': 'system', 'content': 'Answer the Que... \n", "8412 [{'role': 'system', 'content': 'Answer the Que... \n", "\n", " conversation_FAR_falcon \\\n", "0 [{'role': 'system', 'content': 'Answer the Que... \n", "1 [{'role': 'system', 'content': 'Answer the Que... \n", "2 [{'role': 'system', 'content': 'Answer the Que... \n", "3 [{'role': 'system', 'content': 'Answer the Que... \n", "4 [{'role': 'system', 'content': 'Answer the Que... \n", "... ... \n", "8408 [{'role': 'system', 'content': 'Answer the Que... \n", "8409 [{'role': 'system', 'content': 'Answer the Que... \n", "8410 [{'role': 'system', 'content': 'Answer the Que... \n", "8411 [{'role': 'system', 'content': 'Answer the Que... \n", "8412 [{'role': 'system', 'content': 'Answer the Que... \n", "\n", " user_prompt_FA \\\n", "0 [{'role': 'system', 'content': 'Answer the Que... \n", "1 [{'role': 'system', 'content': 'Answer the Que... \n", "2 [{'role': 'system', 'content': 'Answer the Que... \n", "3 [{'role': 'system', 'content': 'Answer the Que... \n", "4 [{'role': 'system', 'content': 'Answer the Que... \n", "... ... \n", "8408 [{'role': 'system', 'content': 'Answer the Que... \n", "8409 [{'role': 'system', 'content': 'Answer the Que... \n", "8410 [{'role': 'system', 'content': 'Answer the Que... \n", "8411 [{'role': 'system', 'content': 'Answer the Que... \n", "8412 [{'role': 'system', 'content': 'Answer the Que... \n", "\n", " conversation_FA \n", "0 [{'role': 'system', 'content': 'Answer the Que... \n", "1 [{'role': 'system', 'content': 'Answer the Que... \n", "2 [{'role': 'system', 'content': 'Answer the Que... \n", "3 [{'role': 'system', 'content': 'Answer the Que... \n", "4 [{'role': 'system', 'content': 'Answer the Que... \n", "... ... \n", "8408 [{'role': 'system', 'content': 'Answer the Que... \n", "8409 [{'role': 'system', 'content': 'Answer the Que... \n", "8410 [{'role': 'system', 'content': 'Answer the Que... \n", "8411 [{'role': 'system', 'content': 'Answer the Que... \n", "8412 [{'role': 'system', 'content': 'Answer the Que... \n", "\n", "[8413 rows x 14 columns]" ] }, "execution_count": 21, "metadata": {}, "output_type": "execute_result" } ], "source": [ "df" ] }, { "cell_type": "markdown", "id": "9fe52e77-0985-49d4-964e-7a17372d7007", "metadata": {}, "source": [ "## Explore Prompts\n", "Gradio can be really useful for quick inline apps. Here I want to make sure everything is as I expect.\n", "\n", "While the above print statements helped me see the format, the gradio app helps me explore a large volume of output easily. \n", "\n", "Note: Its tricky as I cant easily render newlines in strings. So be careful!" ] }, { "cell_type": "code", "execution_count": 22, "id": "a50d9d6c-18e6-476d-9a40-ed7a3f699477", "metadata": { "tags": [] }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "* Running on local URL: http://127.0.0.1:7860\n", "\n", "To create a public link, set `share=True` in `launch()`.\n" ] }, { "data": { "text/html": [ "
" ], "text/plain": [ "" ] }, "metadata": {}, "output_type": "display_data" }, { "data": { "text/plain": [] }, "execution_count": 22, "metadata": {}, "output_type": "execute_result" } ], "source": [ "import json\n", "import gradio as gr\n", "\n", "# Gradio app to browse prompts with left and right buttons\n", "# index = 0\n", "\n", "# Functions to handle prompts\n", "def get_prompt(index, prompt_type):\n", " return df.iloc[index][prompt_type]\n", "\n", "def next_prompt(index, prompt_type):\n", " if index < len(df) - 1:\n", " index += 1\n", " return index, get_prompt(index, prompt_type)\n", "\n", "def previous_prompt(index, prompt_type):\n", " if index > 0:\n", " index -= 1\n", " return index, get_prompt(index, prompt_type)\n", "\n", "# Gradio App\n", "with gr.Blocks() as demo:\n", " gr.Markdown(\"# Prompt Browser\")\n", " with gr.Row():\n", " prompt_type_dropdown = gr.Dropdown(\n", " choices=['conversation_RFA_gpt3_5', 'conversation_RFA_falcon', 'conversation_FAR_gpt3_5', 'conversation_FAR_falcon', 'conversation_FA'],\n", " value='conversation_RFA_gpt3_5',\n", " label=\"Select Prompt Type\"\n", " )\n", " index_display = gr.Textbox(\"0\", label=\"Index\", interactive=False)\n", "\n", " prompt_display = gr.JSON(value=df.iloc[0]['conversation_RFA_gpt3_5'], label=\"Prompt\")\n", " \n", " with gr.Row():\n", " prev_button = gr.Button(\"⬅️ Previous\")\n", " next_button = gr.Button(\"Next ➡️\")\n", " \n", " # State to hold the current index\n", " index_state = gr.State(value=0)\n", "\n", " # Button click events\n", " prev_button.click(\n", " fn=previous_prompt,\n", " inputs=[index_state, prompt_type_dropdown],\n", " outputs=[index_state, prompt_display]\n", " )\n", " next_button.click(\n", " fn=next_prompt,\n", " inputs=[index_state, prompt_type_dropdown],\n", " outputs=[index_state, prompt_display]\n", " )\n", "\n", " # Dropdown change event\n", " prompt_type_dropdown.change(\n", " fn=lambda index, prompt_type: get_prompt(index, prompt_type),\n", " inputs=[index_state, prompt_type_dropdown],\n", " outputs=prompt_display\n", " )\n", "\n", " # Update index display\n", " index_state.change(\n", " fn=lambda index: str(index),\n", " inputs=index_state,\n", " outputs=index_display\n", " )\n", "\n", "# Launch the app\n", "demo.launch(height=900)" ] }, { "cell_type": "code", "execution_count": 23, "id": "e60d1ea0-d717-47ed-b5cf-97c32b53544e", "metadata": {}, "outputs": [ { "ename": "NameError", "evalue": "name 'json_str' is not defined", "output_type": "error", "traceback": [ "\u001b[0;31m---------------------------------------------------------------------------\u001b[0m", "\u001b[0;31mNameError\u001b[0m Traceback (most recent call last)", "Cell \u001b[0;32mIn[23], line 1\u001b[0m\n\u001b[0;32m----> 1\u001b[0m \u001b[43mjson_str\u001b[49m\n", "\u001b[0;31mNameError\u001b[0m: name 'json_str' is not defined" ] } ], "source": [ "json_str" ] }, { "cell_type": "code", "execution_count": null, "id": "7903b9e7-36ee-463a-be38-06ee2614be1d", "metadata": {}, "outputs": [], "source": [ "import base64\n", "\n", "from IPython.display import display, HTML\n", "\n", "gr_cols = ['conversation_RFA_gpt3_5', 'conversation_RFA_falcon',\n", " 'conversation_FAR_gpt3_5', 'conversation_FAR_falcon',\n", " 'conversation_FA']\n", "gr_df = df[gr_cols]\n", "json_str = json.dumps(gr_df.head(20).to_dict())\n", "encoded_data = base64.b64encode(json_str.encode()).decode()\n", "\n", "code = f'''\n", "\n", "\t\n", "\t\t\n", "\t\t\n", "\t\n", "\t\n", "\t\t\n", " import json\n", " import gradio as gr\n", " import pandas as pd\n", " import base64\n", "\n", " encoded_data = \"{encoded_data}\"\n", " decoded_data = json.loads(base64.b64decode(encoded_data).decode())\n", " \n", " df = pd.DataFrame(decoded_data)\n", "\n", "\n", " # Functions to handle prompts\n", " def get_prompt(index, prompt_type):\n", " return df.iloc[index][prompt_type]\n", " \n", " def next_prompt(index, prompt_type):\n", " if index < len(df) - 1:\n", " index += 1\n", " return index, get_prompt(index, prompt_type)\n", " \n", " def previous_prompt(index, prompt_type):\n", " if index > 0:\n", " index -= 1\n", " return index, get_prompt(index, prompt_type)\n", " \n", " # Gradio App\n", " with gr.Blocks() as demo:\n", " gr.Markdown(\"# Prompt Browser\")\n", " with gr.Row():\n", " prompt_type_dropdown = gr.Dropdown(\n", " choices=list(df.columns),\n", " value=list(df.columns)[0],\n", " label=\"Select Prompt Type\"\n", " )\n", " index_display = gr.Textbox(\"0\", label=\"Index\", interactive=False)\n", " \n", " prompt_display = gr.JSON(value=df.iloc[0][list(df.columns)[0]], label=\"Prompt\")\n", " \n", " with gr.Row():\n", " prev_button = gr.Button(\"⬅️ Previous\")\n", " next_button = gr.Button(\"Next ➡️\")\n", " \n", " # State to hold the current index\n", " index_state = gr.State(value=0)\n", " \n", " # Button click events\n", " prev_button.click(\n", " fn=previous_prompt,\n", " inputs=[index_state, prompt_type_dropdown],\n", " outputs=[index_state, prompt_display]\n", " )\n", " next_button.click(\n", " fn=next_prompt,\n", " inputs=[index_state, prompt_type_dropdown],\n", " outputs=[index_state, prompt_display]\n", " )\n", " \n", " # Dropdown change event\n", " prompt_type_dropdown.change(\n", " fn=lambda index, prompt_type: get_prompt(index, prompt_type),\n", " inputs=[index_state, prompt_type_dropdown],\n", " outputs=prompt_display\n", " )\n", " \n", " # Update index display\n", " index_state.change(\n", " fn=lambda index: str(index),\n", " inputs=index_state,\n", " outputs=index_display\n", " )\n", " \n", " # Launch the app\n", " demo.launch(height=900)\n", " \n", " \n", "\t\n", "\n", "'''\n", "\n", "display(HTML(code))" ] }, { "cell_type": "markdown", "id": "c086d26e-4c90-4b31-9ae2-77a3bdeffdfd", "metadata": {}, "source": [ "## Push Dataset to the Hub\n", "There is a catch in our format... alas content is a `dict` right now for the \"assistant\". In \"system\" and \"user\" its a string. `Datasets` is based on parquet/arrow which require columns of fixed types, meaning content should always be a str or a dict. Ill cast it to str for simplicity." ] }, { "cell_type": "code", "execution_count": null, "id": "3ed415c2-cdc7-4549-8edf-98030cb7c61c", "metadata": {}, "outputs": [], "source": [ "cols_to_cast = ['conversation_RFA_gpt3_5', 'conversation_RFA_falcon', 'conversation_FAR_gpt3_5', 'conversation_FAR_falcon', 'conversation_FA']\n", "\n", "def cast_content_keys_to_string(conversation):\n", " user_dict = conversation[2]\n", " user_dict['content'] = str(user_dict['content'])\n", " return conversation\n", "\n", "# Apply the function to all columns\n", "for col in cols_to_cast:\n", " df.loc[:, col] = df[col].apply(lambda x: cast_content_keys_to_string(x))" ] }, { "cell_type": "markdown", "id": "6f609d12-518f-4830-8f8b-c374dbeaba7d", "metadata": {}, "source": [ "Its useful to get a train, test split, then we convert to `Dataset` and push to the hub. We also want to stratify on `'topic'`." ] }, { "cell_type": "code", "execution_count": null, "id": "25f62e9b-09f8-4912-94fd-0ded680614b2", "metadata": { "tags": [] }, "outputs": [], "source": [ "from datasets import Dataset, DatasetDict\n", "from sklearn.model_selection import train_test_split\n", "\n", "# First split to create train and remaining (val + test)\n", "train_df, test_df = train_test_split(df, test_size=0.2, stratify=df['topic'], random_state=42)\n", "\n", "# Reset index to avoid index column in the Hugging Face Dataset\n", "train_df.reset_index(drop=True, inplace=True)\n", "test_df.reset_index(drop=True, inplace=True)\n", "\n", "# Convert each DataFrame to a Dataset object\n", "train_dataset = Dataset.from_pandas(train_df)\n", "test_dataset = Dataset.from_pandas(test_df)\n", "\n", "# Create a DatasetDict with the train, validation, and test datasets\n", "dataset_dict = DatasetDict({\n", " 'train': train_dataset,\n", " 'test': test_dataset\n", "})" ] }, { "cell_type": "code", "execution_count": null, "id": "18a206c5-0e40-46b3-8dfb-20000789b6b5", "metadata": { "tags": [] }, "outputs": [], "source": [ "# Push the dataset to the Hugging Face Hub\n", "dataset_dict.push_to_hub(OUTPUT_DATASET)" ] }, { "cell_type": "code", "execution_count": null, "id": "5ce2886f-1a38-4185-8561-2e3094c94a26", "metadata": {}, "outputs": [], "source": [] } ], "metadata": { "kernelspec": { "display_name": "Python 3 (ipykernel)", "language": "python", "name": "python3" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 3 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", "version": "3.11.11" } }, "nbformat": 4, "nbformat_minor": 5 }