Incorrect answers and explanations:\\n\\n1. Elec...
\n",
+ "
- (a) Seconds and minutes: Satellite technolog...
\n",
" \n",
"
\n",
"
1
\n",
@@ -215,7 +219,7 @@
"
(a) Relieve pain: This option is not correct b...
\n",
"
What does irradiating food do?
\n",
"
(a) Relieve pain (b) Enhance food's nutrients ...
\n",
- "
Sure, let's examine each answer and justify wh...
\n",
+ "
(a) Relieve pain: Irradiating food does not ha...
\n",
"
\n",
"
\n",
"
2
\n",
@@ -226,7 +230,7 @@
"
b) Exfoliation: Exfoliation is the process of ...
\n",
"
What protects a mammal's skin?
\n",
"
(a) Fiber follicles (b) Exfoliation (c) Resist...
\n",
- "
Sure, let's go through each of the provided an...
\n",
+ "
(a) **Fiber follicles**: This is the correct a...
\n",
"
\n",
"
\n",
"
3
\n",
@@ -237,7 +241,7 @@
"
a) Dies: This option is not correct because ea...
\n",
"
What do earthworms do when a segment breaks off?
\n",
"
(a) Dies (b) Regrows it (c) Reproduces (d) Sed...
\n",
- "
1. Reading the question carefully, we can see ...
\n",
+ "
1. **Option (a): Dies**\\n - Earthworms are s...
\n",
"
\n",
"
\n",
"
4
\n",
@@ -248,7 +252,7 @@
"
b) Rainstorms: Lightning is actually a natural...
\n",
"
Lightning can be bad for what?
\n",
"
(a) The environment (b) Rainstorms (c) Destruc...
\n",
- "
1. Food: While essential for the growth and he...
\n",
+ "
(a) The environment: Lightning can release lar...
\n",
"
\n",
"
\n",
"
...
\n",
@@ -270,7 +274,7 @@
"
a) Bandaging open sores is not the correct ans...
\n",
"
Organisms that can cause infection do what?
\n",
"
(a) Bandage open sores (b) Keep flesh clean (c...
\n",
- "
1. Read the question and options carefully: Th...
\n",
+ "
(a) Bandage open sores: This action is typical...
\n",
"
\n",
"
\n",
"
8409
\n",
@@ -281,7 +285,7 @@
"
b) Fungi are living things that can make their...
\n",
"
Fungi are living things that cannot make their...
\n",
"
(a) Food (b) Cells (c) Energy (d) Fruits (e) H...
\n",
- "
1. Read the question and options carefully: Th...
\n",
+ "
1. **Read the question and options carefully.*...
\n",
"
\n",
"
\n",
"
8410
\n",
@@ -292,7 +296,7 @@
"
a) Metabolic reaction: This option is incorrec...
\n",
"
An overheated body can use water for:?
\n",
"
(a) Metabolic reaction (b) Dehydrating (c) Rai...
\n",
- "
1. Read the question and options carefully: Th...
\n",
+ "
- (a) Metabolic reaction: This is incorrect be...
\n",
"
\n",
"
\n",
"
8411
\n",
@@ -303,7 +307,7 @@
"
a) Electrons are involved in cellular respirat...
\n",
"
What is essential for cellular respiration for...
\n",
"
(a) Electron (b) Glucose (c) Energy (d) Energy...
\n",
- "
1. First, let's read the question and options ...
\n",
+ "
1. **Glucose (b)**: Glucose is one of the reac...
\n",
"
\n",
"
\n",
"
8412
\n",
@@ -314,7 +318,7 @@
"
a) H2O: Water is essential for life, but it do...
\n",
"
What helps insulate and protect the body?
\n",
"
(a) H2o (b) Living cells in follicles (c) Laye...
\n",
- "
1. Read the question and options carefully: Th...
\n",
+ "
1. **Read the question and options carefully.*...
\n",
"
\n",
" \n",
"\n",
@@ -387,18 +391,18 @@
"8411 (a) Electron (b) Glucose (c) Energy (d) Energy... \n",
"8412 (a) H2o (b) Living cells in follicles (c) Laye... \n",
"\n",
- " mistral_reasoning \n",
- "0 Incorrect answers and explanations:\\n\\n1. Elec... \n",
- "1 Sure, let's examine each answer and justify wh... \n",
- "2 Sure, let's go through each of the provided an... \n",
- "3 1. Reading the question carefully, we can see ... \n",
- "4 1. Food: While essential for the growth and he... \n",
+ " falcon_reasoning \n",
+ "0 - (a) Seconds and minutes: Satellite technolog... \n",
+ "1 (a) Relieve pain: Irradiating food does not ha... \n",
+ "2 (a) **Fiber follicles**: This is the correct a... \n",
+ "3 1. **Option (a): Dies**\\n - Earthworms are s... \n",
+ "4 (a) The environment: Lightning can release lar... \n",
"... ... \n",
- "8408 1. Read the question and options carefully: Th... \n",
- "8409 1. Read the question and options carefully: Th... \n",
- "8410 1. Read the question and options carefully: Th... \n",
- "8411 1. First, let's read the question and options ... \n",
- "8412 1. Read the question and options carefully: Th... \n",
+ "8408 (a) Bandage open sores: This action is typical... \n",
+ "8409 1. **Read the question and options carefully.*... \n",
+ "8410 - (a) Metabolic reaction: This is incorrect be... \n",
+ "8411 1. **Glucose (b)**: Glucose is one of the reac... \n",
+ "8412 1. **Read the question and options carefully.*... \n",
"\n",
"[8413 rows x 8 columns]"
]
@@ -415,14 +419,21 @@
"# Convert to pandas dataframe\n",
"df = dataset.to_pandas()\n",
"print(f\"Before Cleaning: {len(df)} rows\")\n",
+ "print(df.columns)\n",
"\n",
"# Drop the __index_level_0__ column if it exists\n",
- "df.drop(columns=['mistral_reasoning_prompt'], errors='ignore', inplace=True)\n",
+ "df.drop(columns=['falcon_reasoning_prompt'], errors='ignore', inplace=True)\n",
"\n",
"# Ensure all values in 'formatted_question' are strings\n",
"df.rename(columns={\n",
" 'explanation': 'gpt3_5_reasoning',\n",
"}, inplace=True)\n",
+ "\n",
+ "# Fix formatting\n",
+ "df['question_text'] = df['question_text'].str.replace('\"', '', regex=False)\n",
+ "df['gpt3_5_reasoning'] = df['gpt3_5_reasoning'].str.replace('\"', \"'\", regex=False)\n",
+ "df['falcon_reasoning'] = df['falcon_reasoning'].str.replace('\"', \"'\", regex=False)\n",
+ "\n",
"df"
]
},
@@ -439,105 +450,274 @@
"id": "d124c7cf-a369-46a9-94db-069894145959",
"metadata": {},
"source": [
- "We need to convert our sample into a format similar to below for each of the scenarios.\n",
+ "We need to convert our sample into a format similar to below for each of the scenarios. This is ideal since we can use [chat templates](https://huggingface.co/docs/transformers/en/chat_templating) to easily switch models which might have different special tokens.\n",
"\n",
"```\n",
"[\n",
+ " {\"content\": system_prompt, \"role\": \"system\"},\n",
" {\"content\": user_content, \"role\": \"user\"},\n",
" {\"content\": assistant_response, \"role\": \"assistant\"}\n",
"]\n",
"```\n",
"\n",
- "We should include a helpful system_prompt with a general trivia prefix, and a suffix that contains instructions that fit each scenario.\n",
+ "We should include a helpful `system_prompt` with a general trivia prefix, and a suffix that contains instructions that fit each scenario.\n",
"The `user_content` will have the Question and answer choices.\n",
"The `assistant_response` should reflect the scenario. "
]
},
+ {
+ "cell_type": "markdown",
+ "id": "c85b3c11-18d7-4854-a0ba-ad0c1407fd6d",
+ "metadata": {},
+ "source": [
+ "Its best to understand `template_blocks` in a couple layers. \n",
+ "- The top layer (macro) allows me to decide which pieces I want to include. Sometimes I want just the `system`+`user` message, and for fine-tuning Ill want `system`+`user`+`assistant`\n",
+ "- System+User:\n",
+ " - Inside the this layer I use jinja to interpolate the values I want to add\n",
+ " - I moved `user_content` out to get a feel for how it looks\n",
+ "- Assistant:\n",
+ " - Here we have an if statement to allow me to chose between FA, RFA and FAR\n",
+ " - Inside that we just have the same interploation as seen elsewhere\n",
+ "\n",
+ "You can see in `initial` and `full` the json for the messages structure. Here Im selecting which macros I want to use."
+ ]
+ },
{
"cell_type": "code",
"execution_count": 7,
- "id": "1c6554a6-4717-4bf0-ae51-102630d40fd7",
- "metadata": {
- "tags": []
- },
+ "id": "f42f3c34-f736-4e1c-b904-418caf2b0de1",
+ "metadata": {},
"outputs": [],
"source": [
- "df['user_prompt'] = df.apply(lambda row: f\"Question: {row['question_text']}\\nAnswer Choices: {row['answer_choices']}\", axis=1)"
+ "from jinja2 import Environment, DictLoader\n",
+ "\n",
+ "template_blocks = '''\n",
+ "{%- macro user_message(system_content, question_text, answer_choices) -%}\n",
+ "{\n",
+ " \"role\": \"system\",\n",
+ " \"content\": {{ system_content }}\n",
+ "},\n",
+ "{\n",
+ " \"role\": \"user\",\n",
+ " \"content\": \"Question: {{ question_text }}\\\\nAnswer Choices: {{ answer_choices }}\"\n",
+ "\n",
+ "}\n",
+ "{%- endmacro %}\n",
+ "\n",
+ "{% macro assistant_response(reasoning, answer_key, response_order='default') -%}\n",
+ "{\n",
+ " \"role\": \"assistant\",\n",
+ " \"content\": {\n",
+ " {% if response_order == 'rfa' -%}\n",
+ " \"reasoning\": {{ reasoning | tojson }},\n",
+ " \"final_answer\": \"{{ answer_key }}\"\n",
+ " {% elif response_order == 'far' -%}\n",
+ " \"final_answer\": \"{{ answer_key }}\",\n",
+ " \"reasoning\": {{ reasoning | tojson }}\n",
+ " {% else -%}\n",
+ " \"final_answer\": \"{{ answer_key }}\"\n",
+ " {% endif %}\n",
+ " }\n",
+ "}\n",
+ "{%- endmacro %}\n",
+ "'''\n",
+ "\n",
+ "# System + User only (initial template)\n",
+ "initial = '''\n",
+ "[\n",
+ " {{ user_message(system_content, question_text, answer_choices) }}\n",
+ "]\n",
+ "'''\n",
+ "\n",
+ "# Full conversation template\n",
+ "full = '''\n",
+ "[\n",
+ " {{ user_message(system_content, question_text, answer_choices) }},\n",
+ " {{ assistant_response(reasoning, answer_key, response_order) }}\n",
+ "]\n",
+ "'''\n",
+ "\n",
+ "# Create Jinja environment and load templates\n",
+ "env = Environment(loader=DictLoader({\n",
+ " 'template_blocks': template_blocks,\n",
+ " 'initial': initial,\n",
+ " 'full': full\n",
+ "}))\n",
+ "\n",
+ "# # Load the macro definitions into the environment\n",
+ "macro_template = env.get_template('template_blocks')\n",
+ "env.globals.update(macro_template.module.__dict__)\n",
+ "\n",
+ "# Compile full and initial templates\n",
+ "full_template = env.get_template('full')\n",
+ "initial_template = env.get_template('initial')"
]
},
{
"cell_type": "markdown",
- "id": "1391b7e1-2462-41e3-b2f0-5a8f6c95859b",
+ "id": "35b2b21b-6ecf-490d-a453-7d679e3b1877",
+ "metadata": {},
+ "source": [
+ "### Reasoning Final Answer"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": 8,
+ "id": "eccb2f71-70a9-41fc-8235-d58b8876bdf1",
"metadata": {},
+ "outputs": [],
"source": [
- "Here we need to create the structure of our conversation. Each system prompt should reflect the instructions we want, so we can start with a prefix and add in the specifics for each scenario."
+ "rfa_system_content = 'Answer the Question and include your reasoning and the final answer in a json like: {\"reasoning\": , \"final_answer\": }.'\n",
+ "rfa_system_content = json.dumps(rfa_system_content)\n",
+ "\n",
+ "# USER Prompt\n",
+ "df['user_prompt_RFA'] = df.apply(lambda row: initial_template.render(\n",
+ " system_content=rfa_system_content,\n",
+ " question_text=row['question_text'],\n",
+ " answer_choices=row['answer_choices']\n",
+ "), axis=1)\n",
+ "df['user_prompt_RFA'] = df['user_prompt_RFA'].apply(json.loads)"
]
},
{
"cell_type": "markdown",
- "id": "2c817304-9ecc-43d9-83b9-5bb8dd662797",
+ "id": "778538da-9290-4815-b792-6c632f3d398f",
"metadata": {},
"source": [
- "### Reasoning Final Answer Structured Generation"
+ "#### RFA ChatGPT 3.5 Example"
]
},
{
"cell_type": "code",
- "execution_count": 8,
- "id": "878727dc-4801-4376-be26-1cc601cb5f92",
+ "execution_count": 9,
+ "id": "bb6bf32e-9d2c-40a4-a10c-c1c3df16bf1f",
+ "metadata": {
+ "scrolled": true
+ },
+ "outputs": [],
+ "source": [
+ "def generate_full_conversation(row, reasoning_key):\n",
+ " rfa_template_input = {\n",
+ " 'system_content': rfa_system_content,\n",
+ " 'question_text': row['question_text'],\n",
+ " 'answer_choices': row['answer_choices'],\n",
+ " 'answer_key': row['answer_key'],\n",
+ " 'response_order': 'rfa'\n",
+ " }\n",
+ " return full_template.render(**rfa_template_input, reasoning=row[reasoning_key])\n",
+ "\n",
+ "# Full Conversation GPT3.5\n",
+ "df['conversation_RFA_gpt3_5'] = df.apply(lambda row: generate_full_conversation(row, 'gpt3_5_reasoning'), axis=1)\n",
+ "df['conversation_RFA_gpt3_5'] = df['conversation_RFA_gpt3_5'].apply(json.loads)\n",
+ "\n",
+ "# Full Conversation Falcon\n",
+ "df['conversation_RFA_falcon'] = df.apply(lambda row: generate_full_conversation(row, 'falcon_reasoning'), axis=1)\n",
+ "df['conversation_RFA_falcon'] = df['conversation_RFA_falcon'].apply(json.loads)"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": 10,
+ "id": "c22bae14-d5c2-4ed0-8d52-f98d6a4f24b2",
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
- "[INST] Answer the Question and include your Reasoning and the Final Answer in a json like: {\"Reasoning: \"...\", \"Final Answer\": \"x\"} where x is a letter that corresponds to the answer choice which is a letter between a and h.\n",
+ "Answer the Question and include your reasoning and the final answer in a json like: {\"reasoning\": , \"final_answer\": }.\n",
+ "---\n",
"Question: What is satellite technology used for predicting?\n",
- "Answer Choices: (a) Seconds and minutes (b) The strength and magnitude of an earthquake (c) What it's like outside each day (d) 70-75 degrees fahrenheit (e) Rapid changes occur (f) Dead-ends and false starts. (g) Snow, ice, and rock (h) Around 5 to 27 degrees celsius[/INST] {'Reasoning': \"a) Seconds and minutes: This option is incorrect because satellite technology is not used for predicting time intervals. Satellite technology is used for various purposes such as communication, navigation, and weather forecasting, but it is not used for predicting time intervals.\\n\\nb) The strength and magnitude of an earthquake: This option is incorrect because satellite technology is not used for predicting earthquakes. Earthquake prediction is a complex process that involves seismology and other scientific methods, but satellite technology is not one of them.\\n\\nd) 70-75 degrees Fahrenheit: This option is incorrect because satellite technology is not used for predicting specific temperature ranges. While satellite technology can provide temperature data, it is not used for predicting specific temperature ranges.\\n\\ne) Rapid changes occur: This option is too vague and does not provide enough information to determine whether it is correct or not. Satellite technology can be used to monitor changes in various environmental factors, but it is not used specifically for predicting rapid changes.\\n\\nf) Dead-ends and false starts: This option is incorrect because it is not related to satellite technology or any type of prediction.\\n\\ng) Snow, ice, and rock: This option is incorrect because it is too specific and does not cover the broad range of predictions that satellite technology can be used for. While satellite technology can be used to monitor snow, ice, and rock formations, it is not used exclusively for this purpose.\\n\\nh) Around 5 to 27 degrees Celsius: This option is incorrect because it is too specific and does not cover the broad range of temperature predictions that satellite technology can be used for. While satellite technology can provide temperature data, it is not used exclusively for predicting temperatures within a specific range.\\n\\nTherefore, the correct answer is c) what it's like outside each day, as satellite technology is commonly used for weather forecasting and predicting daily weather conditions.\", 'Final Answer': 'c'}\n",
+ "Answer Choices: (a) Seconds and minutes (b) The strength and magnitude of an earthquake (c) What it's like outside each day (d) 70-75 degrees fahrenheit (e) Rapid changes occur (f) Dead-ends and false starts. (g) Snow, ice, and rock (h) Around 5 to 27 degrees celsius\n",
+ "---\n",
+ "dict_keys(['reasoning', 'final_answer'])\n",
+ "---\n",
+ "a) Seconds and minutes: This option is incorrect because satellite technology is not used for predicting time intervals. Satellite technology is used for various purposes such as communication, navigation, and weather forecasting, but it is not used for predicting time intervals.\n",
"\n",
+ "b) The strength and magnitude of an earthquake: This option is incorrect because satellite technology is not used for predicting earthquakes. Earthquake prediction is a complex process that involves seismology and other scientific methods, but satellite technology is not one of them.\n",
"\n",
+ "d) 70-75 degrees Fahrenheit: This option is incorrect because satellite technology is not used for predicting specific temperature ranges. While satellite technology can provide temperature data, it is not used for predicting specific temperature ranges.\n",
"\n",
- "[INST] Answer the Question and include your Reasoning and the Final Answer in a json like: {\"Reasoning: \"...\", \"Final Answer\": \"x\"} where x is a letter that corresponds to the answer choice which is a letter between a and h.\n",
+ "e) Rapid changes occur: This option is too vague and does not provide enough information to determine whether it is correct or not. Satellite technology can be used to monitor changes in various environmental factors, but it is not used specifically for predicting rapid changes.\n",
+ "\n",
+ "f) Dead-ends and false starts: This option is incorrect because it is not related to satellite technology or any type of prediction.\n",
+ "\n",
+ "g) Snow, ice, and rock: This option is incorrect because it is too specific and does not cover the broad range of predictions that satellite technology can be used for. While satellite technology can be used to monitor snow, ice, and rock formations, it is not used exclusively for this purpose.\n",
+ "\n",
+ "h) Around 5 to 27 degrees Celsius: This option is incorrect because it is too specific and does not cover the broad range of temperature predictions that satellite technology can be used for. While satellite technology can provide temperature data, it is not used exclusively for predicting temperatures within a specific range.\n",
+ "\n",
+ "Therefore, the correct answer is c) what it's like outside each day, as satellite technology is commonly used for weather forecasting and predicting daily weather conditions.\n"
+ ]
+ }
+ ],
+ "source": [
+ "rfa_test_row = df.conversation_RFA_gpt3_5.iloc[0]\n",
+ "print(rfa_test_row[0]['content'])\n",
+ "print('---')\n",
+ "print(rfa_test_row[1]['content'])\n",
+ "print('---')\n",
+ "print(rfa_test_row[2]['content'].keys())\n",
+ "print('---')\n",
+ "print(rfa_test_row[2]['content']['reasoning'])"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "id": "5f1ca469-473f-423d-b673-a7e7278f9bbb",
+ "metadata": {},
+ "source": [
+ "#### RFA Falcon Example"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": 11,
+ "id": "f6b86fc1-beb0-40c8-9293-0824b7926b7b",
+ "metadata": {},
+ "outputs": [
+ {
+ "name": "stdout",
+ "output_type": "stream",
+ "text": [
+ "Answer the Question and include your reasoning and the final answer in a json like: {\"reasoning\": , \"final_answer\": }.\n",
+ "---\n",
"Question: What is satellite technology used for predicting?\n",
- "Answer Choices: (a) Seconds and minutes (b) The strength and magnitude of an earthquake (c) What it's like outside each day (d) 70-75 degrees fahrenheit (e) Rapid changes occur (f) Dead-ends and false starts. (g) Snow, ice, and rock (h) Around 5 to 27 degrees celsius[/INST] {'Reasoning': 'Incorrect answers and explanations:\\n\\n1. Electrical energy: Cells are not visible with electrical energy. Cells are physical structures and need to be observed visually, not through electrical means.\\n\\n2. By indirect light or Bright lights: Cells are so small that they are not visible to the naked eye under normal lighting conditions. Indirect light or bright lights are not a viable method to observe cells.\\n\\n3. Camera lenses: Camera lenses are used to focus and capture images, not to observe live objects. In order to observe cells, they must be viewed through a microscope which has a lens.\\n\\n4. Colors: Because cells are transparent, they do not have any specific color. They can appear as white or translucent when observed under a microscope.\\n\\n5. Telescope: A telescope is used for observing distant stars, galaxies, and other celestial objects, not for observing cells which are much smaller and earthbound.\\n\\n6. Rays or beams: Cells cannot be observed using rays or beams. Observation of cells is usually done via light microscopy.\\n\\n7. None of the above do not provide the means to observe cells, as they are not the tools or methods designed for cellular observation.\\n\\nThe correct answer (d) A microscope is the tool that allows us to see cells due to its magnifying capabilities making the cells visible to the naked eye effectively.', 'Final Answer': 'c'}\n"
+ "Answer Choices: (a) Seconds and minutes (b) The strength and magnitude of an earthquake (c) What it's like outside each day (d) 70-75 degrees fahrenheit (e) Rapid changes occur (f) Dead-ends and false starts. (g) Snow, ice, and rock (h) Around 5 to 27 degrees celsius\n",
+ "---\n",
+ "dict_keys(['reasoning', 'final_answer'])\n",
+ "---\n",
+ "- (a) Seconds and minutes: Satellite technology is not used to predict seconds and minutes. This is too specific and not what satellite technology is generally used for. Satellite technology is used for broader time scales, such as days, weeks, months, and years.\n",
+ "- (b) The strength and magnitude of an earthquake: While some types of satellite data can be used in conjunction with other information to study seismicity, earthquakes themselves are typically predicted using other methods, like seismographs and geological studies.\n",
+ "- (d) 70-75 degrees fahrenheit: This is a specific temperature range, and satellites cannot predict exact temperature ranges like this. While satellites do play a role in weather prediction, which includes temperature, predicting specific ranges within a day isn't a typical application.\n",
+ "- (e) Rapid changes occur: While satellites can detect rapid changes in many areas, such as cloud cover, weather fronts, or volcanic activity, the phrase 'rapid changes occur' is too vague. It could apply to various phenomena, so it's not specifically about predictions related to satellite technology alone.\n",
+ "- (f) Dead-ends and false starts: These descriptions relate to human behavior and decision-making processes, not predictions associated with satellite technology.\n",
+ "- (g) Snow, ice, and rock: This is a very specific combination of weather and geological features. While satellite technology can certainly be used to monitor snow, ice, and rock & soil changes (for landslide detection, for example), it's not specifically about predicting these phenomena in the way the question suggests.\n",
+ "- (h) Around 5 to 27 degrees celsius: Similar to (d), this is a specific temperature range, and satellites alone cannot predict specific temperature ranges so precisely.\n",
+ "\n",
+ "The correct answer, (c) What it's like outside each day, touches on a broader range of environmental conditions, which satellite technology can provide data on. Satellites track weather patterns, cloud cover, sunlight, and other indicators that give us an idea of the outside conditions almost daily. \n",
+ "\n",
+ "This broad, daily overview of environmental conditions is something that satellites can provide, unlike the more narrow and specific predictions mentioned in the other options. Although it's not a technical 'prediction' in the sense of forecasting exact events all the time, it does provide up-to-date and current information about what it's like outside, which can be used to make informed decisions and inform other predictions related to weather and environment.\n"
]
}
],
"source": [
- "# Define system prompt\n",
- "system_prompt_RFA = 'Answer the Question and include your Reasoning and the Final Answer in a json like: {\"Reasoning: \"...\", \"Final Answer\": \"x\"} where x is a letter that corresponds to the answer choice which is a letter between a and h.'\n",
- "\n",
- "df['assistant_prompt_RFA_gpt3_5'] = df.apply(lambda row: {\"Reasoning\": row[\"gpt3_5_reasoning\"].strip(), \"Final Answer\": row[\"answer_key\"]}, axis=1)\n",
- "df['assistant_prompt_RFA_mistral'] = df.apply(lambda row: {\"Reasoning\": row[\"mistral_reasoning\"].strip(), \"Final Answer\": row[\"answer_key\"]}, axis=1)\n",
- "\n",
- "# Step 1: Create user prompt for both gpt3_5 and mistral\n",
- "df['user_prompt_RFA'] = df.apply(lambda row: {\n",
- " \"content\": system_prompt_RFA + '\\n' + row['user_prompt'],\n",
- " \"role\": \"user\"\n",
- "}, axis=1)\n",
- "\n",
- "# Step 2: Create conversation_RFA column using user_prompt_RFA\n",
- "df['conversation_RFA_gpt3_5'] = df.apply(lambda row: tokenizer.apply_chat_template([\n",
- " row['user_prompt_RFA'], # Use the precomputed user prompt\n",
- " {\"content\": row['assistant_prompt_RFA_gpt3_5'], \"role\": \"assistant\"}\n",
- "], tokenize=False), axis=1)\n",
- "\n",
- "df['conversation_RFA_mistral'] = df.apply(lambda row: tokenizer.apply_chat_template([\n",
- " row['user_prompt_RFA'], # Use the precomputed user prompt\n",
- " {\"content\": row['assistant_prompt_RFA_mistral'], \"role\": \"assistant\"}\n",
- "], tokenize=False), axis=1)\n",
- "\n",
- "df['user_prompt_RFA'] = df['user_prompt_RFA'].apply(lambda row: tokenizer.apply_chat_template([row], tokenize=False))\n",
- "\n",
- "df.drop(['assistant_prompt_RFA_gpt3_5', 'assistant_prompt_RFA_mistral'], inplace=True, axis=1)\n",
- "\n",
- "# Example output\n",
- "gpt3_5_example = df['conversation_RFA_gpt3_5'].iloc[0]\n",
- "mistral_example = df['conversation_RFA_mistral'].iloc[0]\n",
- "\n",
- "print(gpt3_5_example)\n",
- "print('\\n\\n')\n",
- "print(mistral_example)"
+ "rfa_test_row = df.conversation_RFA_falcon.iloc[0]\n",
+ "print(rfa_test_row[0]['content'])\n",
+ "print('---')\n",
+ "print(rfa_test_row[1]['content'])\n",
+ "print('---')\n",
+ "print(rfa_test_row[2]['content'].keys())\n",
+ "print('---')\n",
+ "print(rfa_test_row[2]['content']['reasoning'])"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "id": "e475bc17-03b4-45b7-9188-f60110325eff",
+ "metadata": {},
+ "source": [
+ "At this point we should feel pretty comfortable with our prompt, lets repeat this for `FAR` and `FA`."
]
},
{
@@ -550,62 +730,153 @@
},
{
"cell_type": "code",
- "execution_count": 9,
- "id": "c0234242-7389-4c28-ae85-78a82498d0ac",
+ "execution_count": 12,
+ "id": "e5437ce4-f6be-4b0e-88e8-d87f8256945e",
+ "metadata": {},
+ "outputs": [],
+ "source": [
+ "far_system_content = 'Answer the Question and include your Final Answer and the Reasoning in a json like: {\"final_answer\": , \"reasoning\": }.'\n",
+ "far_system_content = json.dumps(far_system_content)\n",
+ "\n",
+ "# USER Prompt\n",
+ "df['user_prompt_FAR'] = df.apply(lambda row: initial_template.render(\n",
+ " system_content=far_system_content,\n",
+ " question_text=row['question_text'],\n",
+ " answer_choices=row['answer_choices']\n",
+ "), axis=1)\n",
+ "df['user_prompt_FAR'] = df['user_prompt_FAR'].apply(json.loads)"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": 13,
+ "id": "f3393931-e8d8-48d7-8734-a6a8f6afc032",
"metadata": {
- "tags": []
+ "scrolled": true
},
+ "outputs": [],
+ "source": [
+ "def generate_full_conversation(row, reasoning_key):\n",
+ " far_template_input = {\n",
+ " 'system_content': far_system_content,\n",
+ " 'question_text': row['question_text'],\n",
+ " 'answer_choices': row['answer_choices'],\n",
+ " 'answer_key': row['answer_key'],\n",
+ " 'response_order': 'far'\n",
+ " }\n",
+ " return full_template.render(**far_template_input, reasoning=row[reasoning_key])\n",
+ "\n",
+ "# Full Conversation GPT3.5\n",
+ "df['conversation_FAR_gpt3_5'] = df.apply(lambda row: generate_full_conversation(row, 'gpt3_5_reasoning'), axis=1)\n",
+ "df['conversation_FAR_gpt3_5'] = df['conversation_FAR_gpt3_5'].apply(json.loads)\n",
+ "\n",
+ "# Full Conversation Falcon\n",
+ "df['conversation_FAR_falcon'] = df.apply(lambda row: generate_full_conversation(row, 'falcon_reasoning'), axis=1)\n",
+ "df['conversation_FAR_falcon'] = df['conversation_FAR_falcon'].apply(json.loads)"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "id": "ef90916f-c37e-4684-9ce6-f89e08158403",
+ "metadata": {},
+ "source": [
+ "#### FAR ChatGPT 3.5 Example"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": 14,
+ "id": "24dcc34d-6c78-42bb-b24f-ee715f27f405",
+ "metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
- "[INST] Answer the Question and include your Final Answer and the Reasoning in a json like: {\"Final Answer\": \"x\", \"Reasoning: \"...\"} where x is a letter that corresponds to the answer choice which is a letter between a and h.\n",
+ "Answer the Question and include your Final Answer and the Reasoning in a json like: {\"final_answer\": , \"reasoning\": }.\n",
+ "---\n",
"Question: What is satellite technology used for predicting?\n",
- "Answer Choices: (a) Seconds and minutes (b) The strength and magnitude of an earthquake (c) What it's like outside each day (d) 70-75 degrees fahrenheit (e) Rapid changes occur (f) Dead-ends and false starts. (g) Snow, ice, and rock (h) Around 5 to 27 degrees celsius[/INST] {'Final Answer': 'c', 'Reasoning': \"a) Seconds and minutes: This option is incorrect because satellite technology is not used for predicting time intervals. Satellite technology is used for various purposes such as communication, navigation, and weather forecasting, but it is not used for predicting time intervals.\\n\\nb) The strength and magnitude of an earthquake: This option is incorrect because satellite technology is not used for predicting earthquakes. Earthquake prediction is a complex process that involves seismology and other scientific methods, but satellite technology is not one of them.\\n\\nd) 70-75 degrees Fahrenheit: This option is incorrect because satellite technology is not used for predicting specific temperature ranges. While satellite technology can provide temperature data, it is not used for predicting specific temperature ranges.\\n\\ne) Rapid changes occur: This option is too vague and does not provide enough information to determine whether it is correct or not. Satellite technology can be used to monitor changes in various environmental factors, but it is not used specifically for predicting rapid changes.\\n\\nf) Dead-ends and false starts: This option is incorrect because it is not related to satellite technology or any type of prediction.\\n\\ng) Snow, ice, and rock: This option is incorrect because it is too specific and does not cover the broad range of predictions that satellite technology can be used for. While satellite technology can be used to monitor snow, ice, and rock formations, it is not used exclusively for this purpose.\\n\\nh) Around 5 to 27 degrees Celsius: This option is incorrect because it is too specific and does not cover the broad range of temperature predictions that satellite technology can be used for. While satellite technology can provide temperature data, it is not used exclusively for predicting temperatures within a specific range.\\n\\nTherefore, the correct answer is c) what it's like outside each day, as satellite technology is commonly used for weather forecasting and predicting daily weather conditions.\"}\n",
+ "Answer Choices: (a) Seconds and minutes (b) The strength and magnitude of an earthquake (c) What it's like outside each day (d) 70-75 degrees fahrenheit (e) Rapid changes occur (f) Dead-ends and false starts. (g) Snow, ice, and rock (h) Around 5 to 27 degrees celsius\n",
+ "---\n",
+ "dict_keys(['final_answer', 'reasoning'])\n",
+ "---\n",
+ "a) Seconds and minutes: This option is incorrect because satellite technology is not used for predicting time intervals. Satellite technology is used for various purposes such as communication, navigation, and weather forecasting, but it is not used for predicting time intervals.\n",
"\n",
+ "b) The strength and magnitude of an earthquake: This option is incorrect because satellite technology is not used for predicting earthquakes. Earthquake prediction is a complex process that involves seismology and other scientific methods, but satellite technology is not one of them.\n",
"\n",
+ "d) 70-75 degrees Fahrenheit: This option is incorrect because satellite technology is not used for predicting specific temperature ranges. While satellite technology can provide temperature data, it is not used for predicting specific temperature ranges.\n",
"\n",
- "[INST] Answer the Question and include your Final Answer and the Reasoning in a json like: {\"Final Answer\": \"x\", \"Reasoning: \"...\"} where x is a letter that corresponds to the answer choice which is a letter between a and h.\n",
+ "e) Rapid changes occur: This option is too vague and does not provide enough information to determine whether it is correct or not. Satellite technology can be used to monitor changes in various environmental factors, but it is not used specifically for predicting rapid changes.\n",
+ "\n",
+ "f) Dead-ends and false starts: This option is incorrect because it is not related to satellite technology or any type of prediction.\n",
+ "\n",
+ "g) Snow, ice, and rock: This option is incorrect because it is too specific and does not cover the broad range of predictions that satellite technology can be used for. While satellite technology can be used to monitor snow, ice, and rock formations, it is not used exclusively for this purpose.\n",
+ "\n",
+ "h) Around 5 to 27 degrees Celsius: This option is incorrect because it is too specific and does not cover the broad range of temperature predictions that satellite technology can be used for. While satellite technology can provide temperature data, it is not used exclusively for predicting temperatures within a specific range.\n",
+ "\n",
+ "Therefore, the correct answer is c) what it's like outside each day, as satellite technology is commonly used for weather forecasting and predicting daily weather conditions.\n"
+ ]
+ }
+ ],
+ "source": [
+ "far_test_row = df.conversation_FAR_gpt3_5.iloc[0]\n",
+ "print(far_test_row[0]['content'])\n",
+ "print('---')\n",
+ "print(far_test_row[1]['content'])\n",
+ "print('---')\n",
+ "print(far_test_row[2]['content'].keys())\n",
+ "print('---')\n",
+ "print(far_test_row[2]['content']['reasoning'])"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "id": "f76c6fd3-149e-4db1-b333-8e2c451286cb",
+ "metadata": {},
+ "source": [
+ "#### FAR Falcon Example"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": 15,
+ "id": "c28f8e7e-bce0-4855-9d25-42a1f5e9b0cd",
+ "metadata": {},
+ "outputs": [
+ {
+ "name": "stdout",
+ "output_type": "stream",
+ "text": [
+ "Answer the Question and include your Final Answer and the Reasoning in a json like: {\"final_answer\": , \"reasoning\": }.\n",
+ "---\n",
"Question: What is satellite technology used for predicting?\n",
- "Answer Choices: (a) Seconds and minutes (b) The strength and magnitude of an earthquake (c) What it's like outside each day (d) 70-75 degrees fahrenheit (e) Rapid changes occur (f) Dead-ends and false starts. (g) Snow, ice, and rock (h) Around 5 to 27 degrees celsius[/INST] {'Final Answer': 'c', 'Reasoning': 'Incorrect answers and explanations:\\n\\n1. Electrical energy: Cells are not visible with electrical energy. Cells are physical structures and need to be observed visually, not through electrical means.\\n\\n2. By indirect light or Bright lights: Cells are so small that they are not visible to the naked eye under normal lighting conditions. Indirect light or bright lights are not a viable method to observe cells.\\n\\n3. Camera lenses: Camera lenses are used to focus and capture images, not to observe live objects. In order to observe cells, they must be viewed through a microscope which has a lens.\\n\\n4. Colors: Because cells are transparent, they do not have any specific color. They can appear as white or translucent when observed under a microscope.\\n\\n5. Telescope: A telescope is used for observing distant stars, galaxies, and other celestial objects, not for observing cells which are much smaller and earthbound.\\n\\n6. Rays or beams: Cells cannot be observed using rays or beams. Observation of cells is usually done via light microscopy.\\n\\n7. None of the above do not provide the means to observe cells, as they are not the tools or methods designed for cellular observation.\\n\\nThe correct answer (d) A microscope is the tool that allows us to see cells due to its magnifying capabilities making the cells visible to the naked eye effectively.'}\n"
+ "Answer Choices: (a) Seconds and minutes (b) The strength and magnitude of an earthquake (c) What it's like outside each day (d) 70-75 degrees fahrenheit (e) Rapid changes occur (f) Dead-ends and false starts. (g) Snow, ice, and rock (h) Around 5 to 27 degrees celsius\n",
+ "---\n",
+ "dict_keys(['final_answer', 'reasoning'])\n",
+ "---\n",
+ "- (a) Seconds and minutes: Satellite technology is not used to predict seconds and minutes. This is too specific and not what satellite technology is generally used for. Satellite technology is used for broader time scales, such as days, weeks, months, and years.\n",
+ "- (b) The strength and magnitude of an earthquake: While some types of satellite data can be used in conjunction with other information to study seismicity, earthquakes themselves are typically predicted using other methods, like seismographs and geological studies.\n",
+ "- (d) 70-75 degrees fahrenheit: This is a specific temperature range, and satellites cannot predict exact temperature ranges like this. While satellites do play a role in weather prediction, which includes temperature, predicting specific ranges within a day isn't a typical application.\n",
+ "- (e) Rapid changes occur: While satellites can detect rapid changes in many areas, such as cloud cover, weather fronts, or volcanic activity, the phrase 'rapid changes occur' is too vague. It could apply to various phenomena, so it's not specifically about predictions related to satellite technology alone.\n",
+ "- (f) Dead-ends and false starts: These descriptions relate to human behavior and decision-making processes, not predictions associated with satellite technology.\n",
+ "- (g) Snow, ice, and rock: This is a very specific combination of weather and geological features. While satellite technology can certainly be used to monitor snow, ice, and rock & soil changes (for landslide detection, for example), it's not specifically about predicting these phenomena in the way the question suggests.\n",
+ "- (h) Around 5 to 27 degrees celsius: Similar to (d), this is a specific temperature range, and satellites alone cannot predict specific temperature ranges so precisely.\n",
+ "\n",
+ "The correct answer, (c) What it's like outside each day, touches on a broader range of environmental conditions, which satellite technology can provide data on. Satellites track weather patterns, cloud cover, sunlight, and other indicators that give us an idea of the outside conditions almost daily. \n",
+ "\n",
+ "This broad, daily overview of environmental conditions is something that satellites can provide, unlike the more narrow and specific predictions mentioned in the other options. Although it's not a technical 'prediction' in the sense of forecasting exact events all the time, it does provide up-to-date and current information about what it's like outside, which can be used to make informed decisions and inform other predictions related to weather and environment.\n"
]
}
],
"source": [
- "system_prompt_FAR = 'Answer the Question and include your Final Answer and the Reasoning in a json like: {\"Final Answer\": \"x\", \"Reasoning: \"...\"} where x is a letter that corresponds to the answer choice which is a letter between a and h.'\n",
- "\n",
- "df['assistant_prompt_FAR_gpt3_5'] = df.apply(lambda row: {\"Final Answer\": row[\"answer_key\"], \"Reasoning\": row[\"gpt3_5_reasoning\"].strip()}, axis=1)\n",
- "df['assistant_prompt_FAR_mistral'] = df.apply(lambda row: {\"Final Answer\": row[\"answer_key\"], \"Reasoning\": row[\"mistral_reasoning\"].strip()}, axis=1)\n",
- "\n",
- "# Step 1: Create user_prompt_FAR column\n",
- "df['user_prompt_FAR'] = df.apply(lambda row: {\n",
- " \"content\": system_prompt_FAR + '\\n' + row['user_prompt'],\n",
- " \"role\": \"user\"\n",
- "}, axis=1)\n",
- "\n",
- "# Step 2: Create conversation_FAR column using user_prompt_FAR\n",
- "df['conversation_FAR_gpt3_5'] = df.apply(lambda row: tokenizer.apply_chat_template([\n",
- " row['user_prompt_FAR'], # Use the precomputed user prompt\n",
- " {\"content\": row['assistant_prompt_FAR_gpt3_5'], \"role\": \"assistant\"}\n",
- "], tokenize=False), axis=1)\n",
- "\n",
- "df['conversation_FAR_mistral'] = df.apply(lambda row: tokenizer.apply_chat_template([\n",
- " row['user_prompt_FAR'], # Use the precomputed user prompt\n",
- " {\"content\": row['assistant_prompt_FAR_mistral'], \"role\": \"assistant\"}\n",
- "], tokenize=False), axis=1)\n",
- "\n",
- "df['user_prompt_FAR'] = df['user_prompt_FAR'].apply(lambda row: tokenizer.apply_chat_template([row], tokenize=False))\n",
- "\n",
- "df.drop(['assistant_prompt_FAR_gpt3_5', 'assistant_prompt_FAR_mistral'], inplace=True, axis=1)\n",
- "\n",
- "# Example output\n",
- "gpt3_5_example = df['conversation_FAR_gpt3_5'].iloc[0]\n",
- "mistral_example = df['conversation_FAR_mistral'].iloc[0]\n",
- "\n",
- "print(gpt3_5_example)\n",
- "print('\\n\\n')\n",
- "print(mistral_example)"
+ "far_test_row = df.conversation_FAR_falcon.iloc[0]\n",
+ "print(far_test_row[0]['content'])\n",
+ "print('---')\n",
+ "print(far_test_row[1]['content'])\n",
+ "print('---')\n",
+ "print(far_test_row[2]['content'].keys())\n",
+ "print('---')\n",
+ "print(far_test_row[2]['content']['reasoning'])"
]
},
{
@@ -618,49 +889,81 @@
},
{
"cell_type": "code",
- "execution_count": 10,
- "id": "64dd3601-a40e-478d-97a5-5728005e5787",
+ "execution_count": 16,
+ "id": "5a48d4a6-66bd-4941-9227-720fb7cde805",
+ "metadata": {},
+ "outputs": [],
+ "source": [
+ "fa_system_content = 'Answer the Question and include your Final Answer in a json like: {\"final_answer\": }.'\n",
+ "fa_system_content = json.dumps(fa_system_content)\n",
+ "\n",
+ "# USER Prompt\n",
+ "df['user_prompt_FA'] = df.apply(lambda row: initial_template.render(\n",
+ " system_content=fa_system_content,\n",
+ " question_text=row['question_text'],\n",
+ " answer_choices=row['answer_choices']\n",
+ "), axis=1)\n",
+ "df['user_prompt_FA'] = df['user_prompt_FA'].apply(json.loads)"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": 17,
+ "id": "23498249-1bee-424c-a0a2-db282c4e60b5",
"metadata": {
- "tags": []
+ "scrolled": true
},
+ "outputs": [],
+ "source": [
+ "def generate_full_conversation(row):\n",
+ " fa_template_input = {\n",
+ " 'system_content': fa_system_content,\n",
+ " 'question_text': row['question_text'],\n",
+ " 'answer_choices': row['answer_choices'],\n",
+ " 'answer_key': row['answer_key'],\n",
+ " 'response_order': 'fa'\n",
+ " }\n",
+ " return full_template.render(**fa_template_input)\n",
+ "\n",
+ "# Full Conversation GPT3.5\n",
+ "df['conversation_FA'] = df.apply(lambda row: generate_full_conversation(row), axis=1)\n",
+ "df['conversation_FA'] = df['conversation_FA'].apply(json.loads)\n"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "id": "0afbe045-4361-4f92-92a0-243029151a43",
+ "metadata": {},
+ "source": [
+ "#### FA Example"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": 18,
+ "id": "9b37bcf3-0a6a-41b4-934f-932687181947",
+ "metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
- "[INST] Answer the Question and include your Final Answer in a json like: {\"Final Answer\": \"x\"} where x is a letter that corresponds to the answer choice which is a letter between a and h.\n",
+ "Answer the Question and include your Final Answer in a json like: {\"final_answer\": }.\n",
+ "---\n",
"Question: What is satellite technology used for predicting?\n",
- "Answer Choices: (a) Seconds and minutes (b) The strength and magnitude of an earthquake (c) What it's like outside each day (d) 70-75 degrees fahrenheit (e) Rapid changes occur (f) Dead-ends and false starts. (g) Snow, ice, and rock (h) Around 5 to 27 degrees celsius[/INST] {'Final Answer': 'c'}\n"
+ "Answer Choices: (a) Seconds and minutes (b) The strength and magnitude of an earthquake (c) What it's like outside each day (d) 70-75 degrees fahrenheit (e) Rapid changes occur (f) Dead-ends and false starts. (g) Snow, ice, and rock (h) Around 5 to 27 degrees celsius\n",
+ "---\n",
+ "{'final_answer': 'c'}\n"
]
}
],
"source": [
- "system_prompt_FA = 'Answer the Question and include your Final Answer in a json like: {\"Final Answer\": \"x\"} where x is a letter that corresponds to the answer choice which is a letter between a and h.'\n",
- "df['assistant_prompt_FA'] = df.apply(lambda row: {\"Final Answer\": row[\"answer_key\"]}, axis=1)\n",
- "\n",
- "# Step 1: Create user_prompt_FA column\n",
- "df['user_prompt_FA'] = df.apply(lambda row: {\n",
- " \"content\": system_prompt_FA + '\\n' + row['user_prompt'],\n",
- " \"role\": \"user\"\n",
- "}, axis=1)\n",
- "\n",
- "# Step 2: Create conversation_FA_R column using user_prompt_FA\n",
- "df['conversation_FA'] = df.apply(lambda row: tokenizer.apply_chat_template([\n",
- " row['user_prompt_FA'], # Use the precomputed user prompt\n",
- " {\"content\": row['assistant_prompt_FA'], \"role\": \"assistant\"}\n",
- " # {\"content\": json.dumps(row['assistant_prompt_FA']), \"role\": \"assistant\"}\n",
- "], tokenize=False), axis=1)\n",
- "\n",
- "\n",
- "df['user_prompt_FA'] = df['user_prompt_FA'].apply(lambda row: tokenizer.apply_chat_template([row], tokenize=False))\n",
- "\n",
- "df.drop(['assistant_prompt_FA'], inplace=True, axis=1)\n",
- "\n",
- "\n",
- "# Example output\n",
- "example = df['conversation_FA'].iloc[0]\n",
- "\n",
- "print(example)"
+ "fa_test_row = df.conversation_FA.iloc[0]\n",
+ "print(fa_test_row[0]['content'])\n",
+ "print('---')\n",
+ "print(fa_test_row[1]['content'])\n",
+ "print('---')\n",
+ "print(fa_test_row[2]['content'])"
]
},
{
@@ -673,7 +976,7 @@
},
{
"cell_type": "code",
- "execution_count": 11,
+ "execution_count": 19,
"id": "69a687d5-35ab-4abb-8bb8-f975fa7be3f7",
"metadata": {
"tags": []
@@ -684,14 +987,13 @@
"text/plain": [
"Index(['formatted_question', 'combined_fact', 'answer_key', 'topic',\n",
" 'gpt3_5_reasoning', 'question_text', 'answer_choices',\n",
- " 'mistral_reasoning', 'user_prompt', 'user_prompt_RFA',\n",
- " 'conversation_RFA_gpt3_5', 'conversation_RFA_mistral',\n",
- " 'user_prompt_FAR', 'conversation_FAR_gpt3_5',\n",
- " 'conversation_FAR_mistral', 'user_prompt_FA', 'conversation_FA'],\n",
+ " 'falcon_reasoning', 'user_prompt_RFA', 'conversation_RFA_gpt3_5',\n",
+ " 'conversation_RFA_falcon', 'user_prompt_FAR', 'conversation_FAR_gpt3_5',\n",
+ " 'conversation_FAR_falcon', 'user_prompt_FA', 'conversation_FA'],\n",
" dtype='object')"
]
},
- "execution_count": 11,
+ "execution_count": 19,
"metadata": {},
"output_type": "execute_result"
}
@@ -702,22 +1004,22 @@
},
{
"cell_type": "code",
- "execution_count": 12,
+ "execution_count": 20,
"id": "6ec3b6b5-a359-4da8-98c8-4dae75b8a2d4",
"metadata": {
"tags": []
},
"outputs": [],
"source": [
- "df = df[['topic', 'question_text', 'answer_key', 'gpt3_5_reasoning', 'mistral_reasoning', 'answer_choices', 'user_prompt', \n",
- " 'user_prompt_RFA', 'conversation_RFA_gpt3_5', 'conversation_RFA_mistral',\n",
- " 'user_prompt_FAR', 'conversation_FAR_gpt3_5', 'conversation_FAR_mistral',\n",
+ "df = df[['topic', 'question_text', 'answer_key', 'gpt3_5_reasoning', 'falcon_reasoning', 'answer_choices',\n",
+ " 'user_prompt_RFA', 'conversation_RFA_gpt3_5', 'conversation_RFA_falcon',\n",
+ " 'user_prompt_FAR', 'conversation_FAR_gpt3_5', 'conversation_FAR_falcon',\n",
" 'user_prompt_FA', 'conversation_FA']]"
]
},
{
"cell_type": "code",
- "execution_count": 13,
+ "execution_count": 21,
"id": "f9f3a5a8-e4ea-4ed9-9fc0-fdd1a19b1c92",
"metadata": {
"tags": []
@@ -748,15 +1050,14 @@
"
question_text
\n",
"
answer_key
\n",
"
gpt3_5_reasoning
\n",
- "
mistral_reasoning
\n",
+ "
falcon_reasoning
\n",
"
answer_choices
\n",
- "
user_prompt
\n",
"
user_prompt_RFA
\n",
"
conversation_RFA_gpt3_5
\n",
- "
conversation_RFA_mistral
\n",
+ "
conversation_RFA_falcon
\n",
"
user_prompt_FAR
\n",
"
conversation_FAR_gpt3_5
\n",
- "
conversation_FAR_mistral
\n",
+ "
conversation_FAR_falcon
\n",
"
user_prompt_FA
\n",
"
conversation_FA
\n",
" \n",
@@ -768,17 +1069,16 @@
"
What is satellite technology used for predicting?
\n",
"
c
\n",
"
a) Seconds and minutes: This option is incorre...
\n",
- "
Incorrect answers and explanations:\\n\\n1. Elec...
\n",
+ "
- (a) Seconds and minutes: Satellite technolog...
\n",
"
(a) Seconds and minutes (b) The strength and m...
\n",
- "
Question: What is satellite technology used fo...
\n",
- "
<s>[INST] Answer the Question and include your...
\n",
- "
<s>[INST] Answer the Question and include your...
\n",
- "
<s>[INST] Answer the Question and include your...
\n",
- "
<s>[INST] Answer the Question and include your...
\n",
- "
<s>[INST] Answer the Question and include your...
\n",
- "
<s>[INST] Answer the Question and include your...
\n",
- "
<s>[INST] Answer the Question and include your...
\n",
- "
<s>[INST] Answer the Question and include your...
\n",
+ "
[{'role': 'system', 'content': 'Answer the Que...
\n",
+ "
[{'role': 'system', 'content': 'Answer the Que...
\n",
+ "
[{'role': 'system', 'content': 'Answer the Que...
\n",
+ "
[{'role': 'system', 'content': 'Answer the Que...
\n",
+ "
[{'role': 'system', 'content': 'Answer the Que...
\n",
+ "
[{'role': 'system', 'content': 'Answer the Que...
\n",
+ "
[{'role': 'system', 'content': 'Answer the Que...
\n",
+ "
[{'role': 'system', 'content': 'Answer the Que...
\n",
" \n",
"
\n",
"
1
\n",
@@ -786,17 +1086,16 @@
"
What does irradiating food do?
\n",
"
c
\n",
"
(a) Relieve pain: This option is not correct b...
\n",
- "
Sure, let's examine each answer and justify wh...
\n",
+ "
(a) Relieve pain: Irradiating food does not ha...
\n",
"
(a) Relieve pain (b) Enhance food's nutrients ...
\n",
- "
Question: What does irradiating food do?\\nAnsw...
\n",
- "
<s>[INST] Answer the Question and include your...
\n",
- "
<s>[INST] Answer the Question and include your...
\n",
- "
<s>[INST] Answer the Question and include your...
\n",
- "
<s>[INST] Answer the Question and include your...
\n",
- "
<s>[INST] Answer the Question and include your...
\n",
- "
<s>[INST] Answer the Question and include your...
\n",
- "
<s>[INST] Answer the Question and include your...
\n",
- "
<s>[INST] Answer the Question and include your...
\n",
+ "
[{'role': 'system', 'content': 'Answer the Que...
\n",
+ "
[{'role': 'system', 'content': 'Answer the Que...
\n",
+ "
[{'role': 'system', 'content': 'Answer the Que...
\n",
+ "
[{'role': 'system', 'content': 'Answer the Que...
\n",
+ "
[{'role': 'system', 'content': 'Answer the Que...
\n",
+ "
[{'role': 'system', 'content': 'Answer the Que...
\n",
+ "
[{'role': 'system', 'content': 'Answer the Que...
\n",
+ "
[{'role': 'system', 'content': 'Answer the Que...
\n",
"
\n",
"
\n",
"
2
\n",
@@ -804,17 +1103,16 @@
"
What protects a mammal's skin?
\n",
"
a
\n",
"
b) Exfoliation: Exfoliation is the process of ...
\n",
- "
Sure, let's go through each of the provided an...
\n",
+ "
(a) **Fiber follicles**: This is the correct a...
\n",
"
(a) Fiber follicles (b) Exfoliation (c) Resist...
\n",
- "
Question: What protects a mammal's skin?\\nAnsw...
\n",
- "
<s>[INST] Answer the Question and include your...
\n",
- "
<s>[INST] Answer the Question and include your...
\n",
- "
<s>[INST] Answer the Question and include your...
\n",
- "
<s>[INST] Answer the Question and include your...
\n",
- "
<s>[INST] Answer the Question and include your...
\n",
- "
<s>[INST] Answer the Question and include your...
\n",
- "
<s>[INST] Answer the Question and include your...
\n",
- "
<s>[INST] Answer the Question and include your...
\n",
+ "
[{'role': 'system', 'content': 'Answer the Que...
\n",
+ "
[{'role': 'system', 'content': 'Answer the Que...
\n",
+ "
[{'role': 'system', 'content': 'Answer the Que...
\n",
+ "
[{'role': 'system', 'content': 'Answer the Que...
\n",
+ "
[{'role': 'system', 'content': 'Answer the Que...
\n",
+ "
[{'role': 'system', 'content': 'Answer the Que...
\n",
+ "
[{'role': 'system', 'content': 'Answer the Que...
\n",
+ "
[{'role': 'system', 'content': 'Answer the Que...
\n",
"
\n",
"
\n",
"
3
\n",
@@ -822,17 +1120,16 @@
"
What do earthworms do when a segment breaks off?
\n",
"
b
\n",
"
a) Dies: This option is not correct because ea...
\n",
- "
1. Reading the question carefully, we can see ...
\n",
+ "
1. **Option (a): Dies**\\n - Earthworms are s...
\n",
"
(a) Dies (b) Regrows it (c) Reproduces (d) Sed...
\n",
- "
Question: What do earthworms do when a segment...
\n",
- "
<s>[INST] Answer the Question and include your...
\n",
- "
<s>[INST] Answer the Question and include your...
\n",
- "
<s>[INST] Answer the Question and include your...
\n",
- "
<s>[INST] Answer the Question and include your...
\n",
- "
<s>[INST] Answer the Question and include your...
\n",
- "
<s>[INST] Answer the Question and include your...
\n",
- "
<s>[INST] Answer the Question and include your...
\n",
- "
<s>[INST] Answer the Question and include your...
\n",
+ "
[{'role': 'system', 'content': 'Answer the Que...
\n",
+ "
[{'role': 'system', 'content': 'Answer the Que...
\n",
+ "
[{'role': 'system', 'content': 'Answer the Que...
\n",
+ "
[{'role': 'system', 'content': 'Answer the Que...
\n",
+ "
[{'role': 'system', 'content': 'Answer the Que...
\n",
+ "
[{'role': 'system', 'content': 'Answer the Que...
\n",
+ "
[{'role': 'system', 'content': 'Answer the Que...
\n",
+ "
[{'role': 'system', 'content': 'Answer the Que...
\n",
"
\n",
"
\n",
"
4
\n",
@@ -840,17 +1137,16 @@
"
Lightning can be bad for what?
\n",
"
a
\n",
"
b) Rainstorms: Lightning is actually a natural...
\n",
- "
1. Food: While essential for the growth and he...
\n",
+ "
(a) The environment: Lightning can release lar...
\n",
"
(a) The environment (b) Rainstorms (c) Destruc...
\n",
- "
Question: Lightning can be bad for what?\\nAnsw...
\n",
- "
<s>[INST] Answer the Question and include your...
\n",
- "
<s>[INST] Answer the Question and include your...
\n",
- "
<s>[INST] Answer the Question and include your...
\n",
- "
<s>[INST] Answer the Question and include your...
\n",
- "
<s>[INST] Answer the Question and include your...
\n",
- "
<s>[INST] Answer the Question and include your...
\n",
- "
<s>[INST] Answer the Question and include your...
\n",
- "
<s>[INST] Answer the Question and include your...
\n",
+ "
[{'role': 'system', 'content': 'Answer the Que...
\n",
+ "
[{'role': 'system', 'content': 'Answer the Que...
\n",
+ "
[{'role': 'system', 'content': 'Answer the Que...
\n",
+ "
[{'role': 'system', 'content': 'Answer the Que...
\n",
+ "
[{'role': 'system', 'content': 'Answer the Que...
\n",
+ "
[{'role': 'system', 'content': 'Answer the Que...
\n",
+ "
[{'role': 'system', 'content': 'Answer the Que...
\n",
+ "
[{'role': 'system', 'content': 'Answer the Que...
\n",
"
\n",
"
\n",
"
...
\n",
@@ -868,7 +1164,6 @@
"
...
\n",
"
...
\n",
"
...
\n",
- "
...
\n",
"
\n",
"
\n",
"
8408
\n",
@@ -876,17 +1171,16 @@
"
Organisms that can cause infection do what?
\n",
"
g
\n",
"
a) Bandaging open sores is not the correct ans...
\n",
- "
1. Read the question and options carefully: Th...
\n",
+ "
(a) Bandage open sores: This action is typical...
\n",
"
(a) Bandage open sores (b) Keep flesh clean (c...
\n",
- "
Question: Organisms that can cause infection d...
\n",
- "
<s>[INST] Answer the Question and include your...
\n",
- "
<s>[INST] Answer the Question and include your...
\n",
- "
<s>[INST] Answer the Question and include your...
\n",
- "
<s>[INST] Answer the Question and include your...
\n",
- "
<s>[INST] Answer the Question and include your...
\n",
- "
<s>[INST] Answer the Question and include your...
\n",
- "
<s>[INST] Answer the Question and include your...
\n",
- "
<s>[INST] Answer the Question and include your...
\n",
+ "
[{'role': 'system', 'content': 'Answer the Que...
\n",
+ "
[{'role': 'system', 'content': 'Answer the Que...
\n",
+ "
[{'role': 'system', 'content': 'Answer the Que...
\n",
+ "
[{'role': 'system', 'content': 'Answer the Que...
\n",
+ "
[{'role': 'system', 'content': 'Answer the Que...
\n",
+ "
[{'role': 'system', 'content': 'Answer the Que...
\n",
+ "
[{'role': 'system', 'content': 'Answer the Que...
\n",
+ "
[{'role': 'system', 'content': 'Answer the Que...
\n",
"
\n",
"
\n",
"
8409
\n",
@@ -894,17 +1188,16 @@
"
Fungi are living things that cannot make their...
\n",
"
a
\n",
"
b) Fungi are living things that can make their...
\n",
- "
1. Read the question and options carefully: Th...
\n",
+ "
1. **Read the question and options carefully.*...
\n",
"
(a) Food (b) Cells (c) Energy (d) Fruits (e) H...
\n",
- "
Question: Fungi are living things that cannot ...
\n",
- "
<s>[INST] Answer the Question and include your...
\n",
- "
<s>[INST] Answer the Question and include your...
\n",
- "
<s>[INST] Answer the Question and include your...
\n",
- "
<s>[INST] Answer the Question and include your...
\n",
- "
<s>[INST] Answer the Question and include your...
\n",
- "
<s>[INST] Answer the Question and include your...
\n",
- "
<s>[INST] Answer the Question and include your...
\n",
- "
<s>[INST] Answer the Question and include your...
\n",
+ "
[{'role': 'system', 'content': 'Answer the Que...
\n",
+ "
[{'role': 'system', 'content': 'Answer the Que...
\n",
+ "
[{'role': 'system', 'content': 'Answer the Que...
\n",
+ "
[{'role': 'system', 'content': 'Answer the Que...
\n",
+ "
[{'role': 'system', 'content': 'Answer the Que...
\n",
+ "
[{'role': 'system', 'content': 'Answer the Que...
\n",
+ "
[{'role': 'system', 'content': 'Answer the Que...
\n",
+ "
[{'role': 'system', 'content': 'Answer the Que...
\n",
"
\n",
"
\n",
"
8410
\n",
@@ -912,17 +1205,16 @@
"
An overheated body can use water for:?
\n",
"
g
\n",
"
a) Metabolic reaction: This option is incorrec...
\n",
- "
1. Read the question and options carefully: Th...
\n",
+ "
- (a) Metabolic reaction: This is incorrect be...
\n",
"
(a) Metabolic reaction (b) Dehydrating (c) Rai...
\n",
- "
Question: An overheated body can use water for...
\n",
- "
<s>[INST] Answer the Question and include your...
\n",
- "
<s>[INST] Answer the Question and include your...
\n",
- "
<s>[INST] Answer the Question and include your...
\n",
- "
<s>[INST] Answer the Question and include your...
\n",
- "
<s>[INST] Answer the Question and include your...
\n",
- "
<s>[INST] Answer the Question and include your...
\n",
- "
<s>[INST] Answer the Question and include your...
\n",
- "
<s>[INST] Answer the Question and include your...
\n",
+ "
[{'role': 'system', 'content': 'Answer the Que...
\n",
+ "
[{'role': 'system', 'content': 'Answer the Que...
\n",
+ "
[{'role': 'system', 'content': 'Answer the Que...
\n",
+ "
[{'role': 'system', 'content': 'Answer the Que...
\n",
+ "
[{'role': 'system', 'content': 'Answer the Que...
\n",
+ "
[{'role': 'system', 'content': 'Answer the Que...
\n",
+ "
[{'role': 'system', 'content': 'Answer the Que...
\n",
+ "
[{'role': 'system', 'content': 'Answer the Que...
\n",
"
\n",
"
\n",
"
8411
\n",
@@ -930,17 +1222,16 @@
"
What is essential for cellular respiration for...
\n",
"
f
\n",
"
a) Electrons are involved in cellular respirat...
\n",
- "
1. First, let's read the question and options ...
\n",
+ "
1. **Glucose (b)**: Glucose is one of the reac...
\n",
"
(a) Electron (b) Glucose (c) Energy (d) Energy...
\n",
- "
Question: What is essential for cellular respi...
\n",
- "
<s>[INST] Answer the Question and include your...
\n",
- "
<s>[INST] Answer the Question and include your...
\n",
- "
<s>[INST] Answer the Question and include your...
\n",
- "
<s>[INST] Answer the Question and include your...
\n",
- "
<s>[INST] Answer the Question and include your...
\n",
- "
<s>[INST] Answer the Question and include your...
\n",
- "
<s>[INST] Answer the Question and include your...
\n",
- "
<s>[INST] Answer the Question and include your...
\n",
+ "
[{'role': 'system', 'content': 'Answer the Que...
\n",
+ "
[{'role': 'system', 'content': 'Answer the Que...
\n",
+ "
[{'role': 'system', 'content': 'Answer the Que...
\n",
+ "
[{'role': 'system', 'content': 'Answer the Que...
\n",
+ "
[{'role': 'system', 'content': 'Answer the Que...
\n",
+ "
[{'role': 'system', 'content': 'Answer the Que...
\n",
+ "
[{'role': 'system', 'content': 'Answer the Que...
\n",
+ "
[{'role': 'system', 'content': 'Answer the Que...
\n",
"
\n",
"
\n",
"
8412
\n",
@@ -948,21 +1239,20 @@
"
What helps insulate and protect the body?
\n",
"
b
\n",
"
a) H2O: Water is essential for life, but it do...
\n",
- "
1. Read the question and options carefully: Th...
\n",
+ "
1. **Read the question and options carefully.*...
\n",
"
(a) H2o (b) Living cells in follicles (c) Laye...
\n",
- "
Question: What helps insulate and protect the ...
\n",
- "
<s>[INST] Answer the Question and include your...
\n",
- "
<s>[INST] Answer the Question and include your...
\n",
- "
<s>[INST] Answer the Question and include your...
\n",
- "
<s>[INST] Answer the Question and include your...
\n",
- "
<s>[INST] Answer the Question and include your...
\n",
- "
<s>[INST] Answer the Question and include your...
\n",
- "
<s>[INST] Answer the Question and include your...
\n",
- "
<s>[INST] Answer the Question and include your...
\n",
+ "
[{'role': 'system', 'content': 'Answer the Que...
\n",
+ "
[{'role': 'system', 'content': 'Answer the Que...
\n",
+ "
[{'role': 'system', 'content': 'Answer the Que...
\n",
+ "
[{'role': 'system', 'content': 'Answer the Que...
\n",
+ "
[{'role': 'system', 'content': 'Answer the Que...
\n",
+ "
[{'role': 'system', 'content': 'Answer the Que...
\n",
+ "
[{'role': 'system', 'content': 'Answer the Que...
\n",
+ "
[{'role': 'system', 'content': 'Answer the Que...
\n",
"
\n",
" \n",
"\n",
- "
8413 rows × 15 columns
\n",
+ "
8413 rows × 14 columns
\n",
""
],
"text/plain": [
@@ -992,18 +1282,18 @@
"8411 f a) Electrons are involved in cellular respirat... \n",
"8412 b a) H2O: Water is essential for life, but it do... \n",
"\n",
- " mistral_reasoning \\\n",
- "0 Incorrect answers and explanations:\\n\\n1. Elec... \n",
- "1 Sure, let's examine each answer and justify wh... \n",
- "2 Sure, let's go through each of the provided an... \n",
- "3 1. Reading the question carefully, we can see ... \n",
- "4 1. Food: While essential for the growth and he... \n",
+ " falcon_reasoning \\\n",
+ "0 - (a) Seconds and minutes: Satellite technolog... \n",
+ "1 (a) Relieve pain: Irradiating food does not ha... \n",
+ "2 (a) **Fiber follicles**: This is the correct a... \n",
+ "3 1. **Option (a): Dies**\\n - Earthworms are s... \n",
+ "4 (a) The environment: Lightning can release lar... \n",
"... ... \n",
- "8408 1. Read the question and options carefully: Th... \n",
- "8409 1. Read the question and options carefully: Th... \n",
- "8410 1. Read the question and options carefully: Th... \n",
- "8411 1. First, let's read the question and options ... \n",
- "8412 1. Read the question and options carefully: Th... \n",
+ "8408 (a) Bandage open sores: This action is typical... \n",
+ "8409 1. **Read the question and options carefully.*... \n",
+ "8410 - (a) Metabolic reaction: This is incorrect be... \n",
+ "8411 1. **Glucose (b)**: Glucose is one of the reac... \n",
+ "8412 1. **Read the question and options carefully.*... \n",
"\n",
" answer_choices \\\n",
"0 (a) Seconds and minutes (b) The strength and m... \n",
@@ -1018,127 +1308,114 @@
"8411 (a) Electron (b) Glucose (c) Energy (d) Energy... \n",
"8412 (a) H2o (b) Living cells in follicles (c) Laye... \n",
"\n",
- " user_prompt \\\n",
- "0 Question: What is satellite technology used fo... \n",
- "1 Question: What does irradiating food do?\\nAnsw... \n",
- "2 Question: What protects a mammal's skin?\\nAnsw... \n",
- "3 Question: What do earthworms do when a segment... \n",
- "4 Question: Lightning can be bad for what?\\nAnsw... \n",
- "... ... \n",
- "8408 Question: Organisms that can cause infection d... \n",
- "8409 Question: Fungi are living things that cannot ... \n",
- "8410 Question: An overheated body can use water for... \n",
- "8411 Question: What is essential for cellular respi... \n",
- "8412 Question: What helps insulate and protect the ... \n",
- "\n",
" user_prompt_RFA \\\n",
- "0 [INST] Answer the Question and include your... \n",
- "1 [INST] Answer the Question and include your... \n",
- "2 [INST] Answer the Question and include your... \n",
- "3 [INST] Answer the Question and include your... \n",
- "4 [INST] Answer the Question and include your... \n",
+ "0 [{'role': 'system', 'content': 'Answer the Que... \n",
+ "1 [{'role': 'system', 'content': 'Answer the Que... \n",
+ "2 [{'role': 'system', 'content': 'Answer the Que... \n",
+ "3 [{'role': 'system', 'content': 'Answer the Que... \n",
+ "4 [{'role': 'system', 'content': 'Answer the Que... \n",
"... ... \n",
- "8408 [INST] Answer the Question and include your... \n",
- "8409 [INST] Answer the Question and include your... \n",
- "8410 [INST] Answer the Question and include your... \n",
- "8411 [INST] Answer the Question and include your... \n",
- "8412 [INST] Answer the Question and include your... \n",
+ "8408 [{'role': 'system', 'content': 'Answer the Que... \n",
+ "8409 [{'role': 'system', 'content': 'Answer the Que... \n",
+ "8410 [{'role': 'system', 'content': 'Answer the Que... \n",
+ "8411 [{'role': 'system', 'content': 'Answer the Que... \n",
+ "8412 [{'role': 'system', 'content': 'Answer the Que... \n",
"\n",
" conversation_RFA_gpt3_5 \\\n",
- "0 [INST] Answer the Question and include your... \n",
- "1 [INST] Answer the Question and include your... \n",
- "2 [INST] Answer the Question and include your... \n",
- "3 [INST] Answer the Question and include your... \n",
- "4 [INST] Answer the Question and include your... \n",
+ "0 [{'role': 'system', 'content': 'Answer the Que... \n",
+ "1 [{'role': 'system', 'content': 'Answer the Que... \n",
+ "2 [{'role': 'system', 'content': 'Answer the Que... \n",
+ "3 [{'role': 'system', 'content': 'Answer the Que... \n",
+ "4 [{'role': 'system', 'content': 'Answer the Que... \n",
"... ... \n",
- "8408 [INST] Answer the Question and include your... \n",
- "8409 [INST] Answer the Question and include your... \n",
- "8410 [INST] Answer the Question and include your... \n",
- "8411 [INST] Answer the Question and include your... \n",
- "8412 [INST] Answer the Question and include your... \n",
+ "8408 [{'role': 'system', 'content': 'Answer the Que... \n",
+ "8409 [{'role': 'system', 'content': 'Answer the Que... \n",
+ "8410 [{'role': 'system', 'content': 'Answer the Que... \n",
+ "8411 [{'role': 'system', 'content': 'Answer the Que... \n",
+ "8412 [{'role': 'system', 'content': 'Answer the Que... \n",
"\n",
- " conversation_RFA_mistral \\\n",
- "0 [INST] Answer the Question and include your... \n",
- "1 [INST] Answer the Question and include your... \n",
- "2 [INST] Answer the Question and include your... \n",
- "3 [INST] Answer the Question and include your... \n",
- "4 [INST] Answer the Question and include your... \n",
+ " conversation_RFA_falcon \\\n",
+ "0 [{'role': 'system', 'content': 'Answer the Que... \n",
+ "1 [{'role': 'system', 'content': 'Answer the Que... \n",
+ "2 [{'role': 'system', 'content': 'Answer the Que... \n",
+ "3 [{'role': 'system', 'content': 'Answer the Que... \n",
+ "4 [{'role': 'system', 'content': 'Answer the Que... \n",
"... ... \n",
- "8408 [INST] Answer the Question and include your... \n",
- "8409 [INST] Answer the Question and include your... \n",
- "8410 [INST] Answer the Question and include your... \n",
- "8411 [INST] Answer the Question and include your... \n",
- "8412 [INST] Answer the Question and include your... \n",
+ "8408 [{'role': 'system', 'content': 'Answer the Que... \n",
+ "8409 [{'role': 'system', 'content': 'Answer the Que... \n",
+ "8410 [{'role': 'system', 'content': 'Answer the Que... \n",
+ "8411 [{'role': 'system', 'content': 'Answer the Que... \n",
+ "8412 [{'role': 'system', 'content': 'Answer the Que... \n",
"\n",
" user_prompt_FAR \\\n",
- "0 [INST] Answer the Question and include your... \n",
- "1 [INST] Answer the Question and include your... \n",
- "2 [INST] Answer the Question and include your... \n",
- "3 [INST] Answer the Question and include your... \n",
- "4 [INST] Answer the Question and include your... \n",
+ "0 [{'role': 'system', 'content': 'Answer the Que... \n",
+ "1 [{'role': 'system', 'content': 'Answer the Que... \n",
+ "2 [{'role': 'system', 'content': 'Answer the Que... \n",
+ "3 [{'role': 'system', 'content': 'Answer the Que... \n",
+ "4 [{'role': 'system', 'content': 'Answer the Que... \n",
"... ... \n",
- "8408 [INST] Answer the Question and include your... \n",
- "8409 [INST] Answer the Question and include your... \n",
- "8410 [INST] Answer the Question and include your... \n",
- "8411 [INST] Answer the Question and include your... \n",
- "8412 [INST] Answer the Question and include your... \n",
+ "8408 [{'role': 'system', 'content': 'Answer the Que... \n",
+ "8409 [{'role': 'system', 'content': 'Answer the Que... \n",
+ "8410 [{'role': 'system', 'content': 'Answer the Que... \n",
+ "8411 [{'role': 'system', 'content': 'Answer the Que... \n",
+ "8412 [{'role': 'system', 'content': 'Answer the Que... \n",
"\n",
" conversation_FAR_gpt3_5 \\\n",
- "0 [INST] Answer the Question and include your... \n",
- "1 [INST] Answer the Question and include your... \n",
- "2 [INST] Answer the Question and include your... \n",
- "3 [INST] Answer the Question and include your... \n",
- "4 [INST] Answer the Question and include your... \n",
+ "0 [{'role': 'system', 'content': 'Answer the Que... \n",
+ "1 [{'role': 'system', 'content': 'Answer the Que... \n",
+ "2 [{'role': 'system', 'content': 'Answer the Que... \n",
+ "3 [{'role': 'system', 'content': 'Answer the Que... \n",
+ "4 [{'role': 'system', 'content': 'Answer the Que... \n",
"... ... \n",
- "8408 [INST] Answer the Question and include your... \n",
- "8409 [INST] Answer the Question and include your... \n",
- "8410 [INST] Answer the Question and include your... \n",
- "8411 [INST] Answer the Question and include your... \n",
- "8412 [INST] Answer the Question and include your... \n",
+ "8408 [{'role': 'system', 'content': 'Answer the Que... \n",
+ "8409 [{'role': 'system', 'content': 'Answer the Que... \n",
+ "8410 [{'role': 'system', 'content': 'Answer the Que... \n",
+ "8411 [{'role': 'system', 'content': 'Answer the Que... \n",
+ "8412 [{'role': 'system', 'content': 'Answer the Que... \n",
"\n",
- " conversation_FAR_mistral \\\n",
- "0 [INST] Answer the Question and include your... \n",
- "1 [INST] Answer the Question and include your... \n",
- "2 [INST] Answer the Question and include your... \n",
- "3 [INST] Answer the Question and include your... \n",
- "4 [INST] Answer the Question and include your... \n",
+ " conversation_FAR_falcon \\\n",
+ "0 [{'role': 'system', 'content': 'Answer the Que... \n",
+ "1 [{'role': 'system', 'content': 'Answer the Que... \n",
+ "2 [{'role': 'system', 'content': 'Answer the Que... \n",
+ "3 [{'role': 'system', 'content': 'Answer the Que... \n",
+ "4 [{'role': 'system', 'content': 'Answer the Que... \n",
"... ... \n",
- "8408 [INST] Answer the Question and include your... \n",
- "8409 [INST] Answer the Question and include your... \n",
- "8410 [INST] Answer the Question and include your... \n",
- "8411 [INST] Answer the Question and include your... \n",
- "8412 [INST] Answer the Question and include your... \n",
+ "8408 [{'role': 'system', 'content': 'Answer the Que... \n",
+ "8409 [{'role': 'system', 'content': 'Answer the Que... \n",
+ "8410 [{'role': 'system', 'content': 'Answer the Que... \n",
+ "8411 [{'role': 'system', 'content': 'Answer the Que... \n",
+ "8412 [{'role': 'system', 'content': 'Answer the Que... \n",
"\n",
" user_prompt_FA \\\n",
- "0 [INST] Answer the Question and include your... \n",
- "1 [INST] Answer the Question and include your... \n",
- "2 [INST] Answer the Question and include your... \n",
- "3 [INST] Answer the Question and include your... \n",
- "4 [INST] Answer the Question and include your... \n",
+ "0 [{'role': 'system', 'content': 'Answer the Que... \n",
+ "1 [{'role': 'system', 'content': 'Answer the Que... \n",
+ "2 [{'role': 'system', 'content': 'Answer the Que... \n",
+ "3 [{'role': 'system', 'content': 'Answer the Que... \n",
+ "4 [{'role': 'system', 'content': 'Answer the Que... \n",
"... ... \n",
- "8408 [INST] Answer the Question and include your... \n",
- "8409 [INST] Answer the Question and include your... \n",
- "8410 [INST] Answer the Question and include your... \n",
- "8411 [INST] Answer the Question and include your... \n",
- "8412 [INST] Answer the Question and include your... \n",
+ "8408 [{'role': 'system', 'content': 'Answer the Que... \n",
+ "8409 [{'role': 'system', 'content': 'Answer the Que... \n",
+ "8410 [{'role': 'system', 'content': 'Answer the Que... \n",
+ "8411 [{'role': 'system', 'content': 'Answer the Que... \n",
+ "8412 [{'role': 'system', 'content': 'Answer the Que... \n",
"\n",
" conversation_FA \n",
- "0 [INST] Answer the Question and include your... \n",
- "1 [INST] Answer the Question and include your... \n",
- "2 [INST] Answer the Question and include your... \n",
- "3 [INST] Answer the Question and include your... \n",
- "4 [INST] Answer the Question and include your... \n",
+ "0 [{'role': 'system', 'content': 'Answer the Que... \n",
+ "1 [{'role': 'system', 'content': 'Answer the Que... \n",
+ "2 [{'role': 'system', 'content': 'Answer the Que... \n",
+ "3 [{'role': 'system', 'content': 'Answer the Que... \n",
+ "4 [{'role': 'system', 'content': 'Answer the Que... \n",
"... ... \n",
- "8408 [INST] Answer the Question and include your... \n",
- "8409 [INST] Answer the Question and include your... \n",
- "8410 [INST] Answer the Question and include your... \n",
- "8411 [INST] Answer the Question and include your... \n",
- "8412 [INST] Answer the Question and include your... \n",
+ "8408 [{'role': 'system', 'content': 'Answer the Que... \n",
+ "8409 [{'role': 'system', 'content': 'Answer the Que... \n",
+ "8410 [{'role': 'system', 'content': 'Answer the Que... \n",
+ "8411 [{'role': 'system', 'content': 'Answer the Que... \n",
+ "8412 [{'role': 'system', 'content': 'Answer the Que... \n",
"\n",
- "[8413 rows x 15 columns]"
+ "[8413 rows x 14 columns]"
]
},
- "execution_count": 13,
+ "execution_count": 21,
"metadata": {},
"output_type": "execute_result"
}
@@ -1162,7 +1439,7 @@
},
{
"cell_type": "code",
- "execution_count": 14,
+ "execution_count": 22,
"id": "a50d9d6c-18e6-476d-9a40-ed7a3f699477",
"metadata": {
"tags": []
@@ -1172,7 +1449,7 @@
"name": "stdout",
"output_type": "stream",
"text": [
- "* Running on local URL: http://127.0.0.1:7861\n",
+ "* Running on local URL: http://127.0.0.1:7860\n",
"\n",
"To create a public link, set `share=True` in `launch()`.\n"
]
@@ -1180,7 +1457,7 @@
{
"data": {
"text/html": [
- ""
+ ""
],
"text/plain": [
""
@@ -1193,7 +1470,7 @@
"data": {
"text/plain": []
},
- "execution_count": 14,
+ "execution_count": 22,
"metadata": {},
"output_type": "execute_result"
}
@@ -1224,13 +1501,13 @@
" gr.Markdown(\"# Prompt Browser\")\n",
" with gr.Row():\n",
" prompt_type_dropdown = gr.Dropdown(\n",
- " choices=['conversation_RFA_gpt3_5', 'conversation_RFA_mistral', 'conversation_FAR_gpt3_5', 'conversation_FAR_mistral', 'conversation_FA_gpt3_5', 'conversation_FA_mistral'],\n",
+ " choices=['conversation_RFA_gpt3_5', 'conversation_RFA_falcon', 'conversation_FAR_gpt3_5', 'conversation_FAR_falcon', 'conversation_FA'],\n",
" value='conversation_RFA_gpt3_5',\n",
" label=\"Select Prompt Type\"\n",
" )\n",
" index_display = gr.Textbox(\"0\", label=\"Index\", interactive=False)\n",
"\n",
- " prompt_display = gr.Textbox(value=df.iloc[0]['conversation_RFA_gpt3_5'], label=\"Prompt\")\n",
+ " prompt_display = gr.JSON(value=df.iloc[0]['conversation_RFA_gpt3_5'], label=\"Prompt\")\n",
" \n",
" with gr.Row():\n",
" prev_button = gr.Button(\"⬅️ Previous\")\n",
@@ -1266,21 +1543,178 @@
" )\n",
"\n",
"# Launch the app\n",
- "demo.launch(height=840)"
+ "demo.launch(height=900)"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": 23,
+ "id": "e60d1ea0-d717-47ed-b5cf-97c32b53544e",
+ "metadata": {},
+ "outputs": [
+ {
+ "ename": "NameError",
+ "evalue": "name 'json_str' is not defined",
+ "output_type": "error",
+ "traceback": [
+ "\u001b[0;31m---------------------------------------------------------------------------\u001b[0m",
+ "\u001b[0;31mNameError\u001b[0m Traceback (most recent call last)",
+ "Cell \u001b[0;32mIn[23], line 1\u001b[0m\n\u001b[0;32m----> 1\u001b[0m \u001b[43mjson_str\u001b[49m\n",
+ "\u001b[0;31mNameError\u001b[0m: name 'json_str' is not defined"
+ ]
+ }
+ ],
+ "source": [
+ "json_str"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": null,
+ "id": "7903b9e7-36ee-463a-be38-06ee2614be1d",
+ "metadata": {},
+ "outputs": [],
+ "source": [
+ "import base64\n",
+ "\n",
+ "from IPython.display import display, HTML\n",
+ "\n",
+ "gr_cols = ['conversation_RFA_gpt3_5', 'conversation_RFA_falcon',\n",
+ " 'conversation_FAR_gpt3_5', 'conversation_FAR_falcon',\n",
+ " 'conversation_FA']\n",
+ "gr_df = df[gr_cols]\n",
+ "json_str = json.dumps(gr_df.head(20).to_dict())\n",
+ "encoded_data = base64.b64encode(json_str.encode()).decode()\n",
+ "\n",
+ "code = f'''\n",
+ "\n",
+ "\t\n",
+ "\t\t\n",
+ "\t\t\n",
+ "\t\n",
+ "\t\n",
+ "\t\t\n",
+ " import json\n",
+ " import gradio as gr\n",
+ " import pandas as pd\n",
+ " import base64\n",
+ "\n",
+ " encoded_data = \"{encoded_data}\"\n",
+ " decoded_data = json.loads(base64.b64decode(encoded_data).decode())\n",
+ " \n",
+ " df = pd.DataFrame(decoded_data)\n",
+ "\n",
+ "\n",
+ " # Functions to handle prompts\n",
+ " def get_prompt(index, prompt_type):\n",
+ " return df.iloc[index][prompt_type]\n",
+ " \n",
+ " def next_prompt(index, prompt_type):\n",
+ " if index < len(df) - 1:\n",
+ " index += 1\n",
+ " return index, get_prompt(index, prompt_type)\n",
+ " \n",
+ " def previous_prompt(index, prompt_type):\n",
+ " if index > 0:\n",
+ " index -= 1\n",
+ " return index, get_prompt(index, prompt_type)\n",
+ " \n",
+ " # Gradio App\n",
+ " with gr.Blocks() as demo:\n",
+ " gr.Markdown(\"# Prompt Browser\")\n",
+ " with gr.Row():\n",
+ " prompt_type_dropdown = gr.Dropdown(\n",
+ " choices=list(df.columns),\n",
+ " value=list(df.columns)[0],\n",
+ " label=\"Select Prompt Type\"\n",
+ " )\n",
+ " index_display = gr.Textbox(\"0\", label=\"Index\", interactive=False)\n",
+ " \n",
+ " prompt_display = gr.JSON(value=df.iloc[0][list(df.columns)[0]], label=\"Prompt\")\n",
+ " \n",
+ " with gr.Row():\n",
+ " prev_button = gr.Button(\"⬅️ Previous\")\n",
+ " next_button = gr.Button(\"Next ➡��\")\n",
+ " \n",
+ " # State to hold the current index\n",
+ " index_state = gr.State(value=0)\n",
+ " \n",
+ " # Button click events\n",
+ " prev_button.click(\n",
+ " fn=previous_prompt,\n",
+ " inputs=[index_state, prompt_type_dropdown],\n",
+ " outputs=[index_state, prompt_display]\n",
+ " )\n",
+ " next_button.click(\n",
+ " fn=next_prompt,\n",
+ " inputs=[index_state, prompt_type_dropdown],\n",
+ " outputs=[index_state, prompt_display]\n",
+ " )\n",
+ " \n",
+ " # Dropdown change event\n",
+ " prompt_type_dropdown.change(\n",
+ " fn=lambda index, prompt_type: get_prompt(index, prompt_type),\n",
+ " inputs=[index_state, prompt_type_dropdown],\n",
+ " outputs=prompt_display\n",
+ " )\n",
+ " \n",
+ " # Update index display\n",
+ " index_state.change(\n",
+ " fn=lambda index: str(index),\n",
+ " inputs=index_state,\n",
+ " outputs=index_display\n",
+ " )\n",
+ " \n",
+ " # Launch the app\n",
+ " demo.launch(height=900)\n",
+ " \n",
+ " \n",
+ "\t\n",
+ "\n",
+ "'''\n",
+ "\n",
+ "display(HTML(code))"
]
},
{
"cell_type": "markdown",
- "id": "3838daf0-8a3a-4513-ad2b-3589d60dfa3d",
+ "id": "c086d26e-4c90-4b31-9ae2-77a3bdeffdfd",
"metadata": {},
"source": [
"## Push Dataset to the Hub\n",
+ "There is a catch in our format... alas content is a `dict` right now for the \"assistant\". In \"system\" and \"user\" its a string. `Datasets` is based on parquet/arrow which require columns of fixed types, meaning content should always be a str or a dict. Ill cast it to str for simplicity."
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": null,
+ "id": "3ed415c2-cdc7-4549-8edf-98030cb7c61c",
+ "metadata": {},
+ "outputs": [],
+ "source": [
+ "cols_to_cast = ['conversation_RFA_gpt3_5', 'conversation_RFA_falcon', 'conversation_FAR_gpt3_5', 'conversation_FAR_falcon', 'conversation_FA']\n",
+ "\n",
+ "def cast_content_keys_to_string(conversation):\n",
+ " user_dict = conversation[2]\n",
+ " user_dict['content'] = str(user_dict['content'])\n",
+ " return conversation\n",
+ "\n",
+ "# Apply the function to all columns\n",
+ "for col in cols_to_cast:\n",
+ " df.loc[:, col] = df[col].apply(lambda x: cast_content_keys_to_string(x))"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "id": "6f609d12-518f-4830-8f8b-c374dbeaba7d",
+ "metadata": {},
+ "source": [
"Its useful to get a train, test split, then we convert to `Dataset` and push to the hub. We also want to stratify on `'topic'`."
]
},
{
"cell_type": "code",
- "execution_count": 15,
+ "execution_count": null,
"id": "25f62e9b-09f8-4912-94fd-0ded680614b2",
"metadata": {
"tags": []
@@ -1310,79 +1744,12 @@
},
{
"cell_type": "code",
- "execution_count": 16,
+ "execution_count": null,
"id": "18a206c5-0e40-46b3-8dfb-20000789b6b5",
"metadata": {
"tags": []
},
- "outputs": [
- {
- "data": {
- "application/vnd.jupyter.widget-view+json": {
- "model_id": "0e40274a780c4da0868362339d48e6de",
- "version_major": 2,
- "version_minor": 0
- },
- "text/plain": [
- "Uploading the dataset shards: 0%| | 0/1 [00:00, ?it/s]"
- ]
- },
- "metadata": {},
- "output_type": "display_data"
- },
- {
- "data": {
- "application/vnd.jupyter.widget-view+json": {
- "model_id": "d8ed56081baa4b4a98af22b3d904a231",
- "version_major": 2,
- "version_minor": 0
- },
- "text/plain": [
- "Creating parquet from Arrow format: 0%| | 0/7 [00:00, ?ba/s]"
- ]
- },
- "metadata": {},
- "output_type": "display_data"
- },
- {
- "data": {
- "application/vnd.jupyter.widget-view+json": {
- "model_id": "c6b2c92b5aaa4943ab8b086f58455440",
- "version_major": 2,
- "version_minor": 0
- },
- "text/plain": [
- "Uploading the dataset shards: 0%| | 0/1 [00:00, ?it/s]"
- ]
- },
- "metadata": {},
- "output_type": "display_data"
- },
- {
- "data": {
- "application/vnd.jupyter.widget-view+json": {
- "model_id": "f4a05ea3069c4f60bca73b488f8221d5",
- "version_major": 2,
- "version_minor": 0
- },
- "text/plain": [
- "Creating parquet from Arrow format: 0%| | 0/2 [00:00, ?ba/s]"
- ]
- },
- "metadata": {},
- "output_type": "display_data"
- },
- {
- "data": {
- "text/plain": [
- "CommitInfo(commit_url='https://huggingface.co/datasets/derek-thomas/labeled-multiple-choice-explained-mistral-tokenized/commit/f96a8487961dcfe6077df67b5351c041a4523eb1', commit_message='Upload dataset', commit_description='', oid='f96a8487961dcfe6077df67b5351c041a4523eb1', pr_url=None, repo_url=RepoUrl('https://huggingface.co/datasets/derek-thomas/labeled-multiple-choice-explained-mistral-tokenized', endpoint='https://huggingface.co', repo_type='dataset', repo_id='derek-thomas/labeled-multiple-choice-explained-mistral-tokenized'), pr_revision=None, pr_num=None)"
- ]
- },
- "execution_count": 16,
- "metadata": {},
- "output_type": "execute_result"
- }
- ],
+ "outputs": [],
"source": [
"# Push the dataset to the Hugging Face Hub\n",
"dataset_dict.push_to_hub(OUTPUT_DATASET)"
@@ -1391,7 +1758,7 @@
{
"cell_type": "code",
"execution_count": null,
- "id": "14c279ea-4f3b-40bd-9795-edd5ea3694f7",
+ "id": "5ce2886f-1a38-4185-8561-2e3094c94a26",
"metadata": {},
"outputs": [],
"source": []
@@ -1413,7 +1780,7 @@
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
- "version": "3.11.10"
+ "version": "3.11.11"
}
},
"nbformat": 4,