Spaces:

qanta-challenge
/

quizbowl-submission

Running

App Files Files Community

quizbowl-submission / docs /walkthrough.md

Maharshi Gor

Configure Git LFS for PNG files

7acf14e 25 days ago

preview code

raw

history blame

7.44 kB

	# Quizbowl Agent Web Interface Walkthrough

	This walkthrough guide will help you create, test, and submit your quizbowl agent for both tossup and bonus questions using our web interface.

	## Overview

	Our web interface allows you to:
	- Create and import pipeline workflows
	- Configure tossup and bonus agents
	- Test agents on sample questions
	- Visualize agent performance
	- Export your pipeline configurations to yaml files.
	- Submit your agents to our competition for full evaluation.

	## Creating a Tossup Agent

	1. Navigate to the "Tossup Agents" tab at the top of the interface.

	2. Creating a Pipeline:
	- Note the input variable `question_text` and output variables `answer`, `confidence` required for tossup agents.
	- The default setup includes a single agent step labeled "A: Tossup Agent".
	- You can add more steps using the "+ Add Step" button for multi-step pipelines.

	3. Configuring Your Agent:
	- Select your preferred model from the dropdown (e.g., `OpenAI/gpt-4o-mini`).
	- Adjust the temperature slider (higher values = more creativity, lower = more deterministic).
	- Click on the "System Prompt" tab to customize your agent's instructions.
	- Your system prompt is crucial - it tells the LLM how to interpret questions and format answers.

	4. Managing Input and Output Variables:

	![Inputs Tab](./imgs/inputs-tab.png)
	- Click the "Inputs" tab to view and modify input variables:
	- Each input has a "Variable Used" (how it's referenced in the pipeline)
	- "Input Name" (what the model sees)
	- "Description" (explains the input's purpose)
	- Use the "+" button to add a new input variable
	- Use the "×" button to remove an input variable

	![Outputs Tab](./imgs/outputs-tab.png)
	- Click the "Outputs" tab to manage output variables:
	- Each output has an "Output Field" name
	- "Type" dropdown (str, float, etc.) to define the data type
	- "Description" explaining the output's purpose
	- Use arrow buttons to change output order
	- Use the "+" button to add a new output
	- Use the "×" button to remove an output

	- When adding a new output, consider the following types:
	- `str`: For text outputs like answers
	- `float`: For numerical outputs like confidence scores (0.0-1.0)
	- `bool`: For true/false outputs
	- `list`: For array outputs like multiple answer candidates

	5. Buzzer Settings:

	![Buzzer Settings](./imgs/buzzer-settings.png)
	- Scroll down to the "Buzzer settings" section.
	- Set your confidence threshold (e.g., 0.85) - your agent will only buzz when its confidence exceeds this value.
	- Choose a method (AND/OR) if combining multiple conditions.
	- Adjust probability settings if needed.

	6. Testing Your Agent:
	- Enter a Question ID or use the provided sample question.
	- Check "Early Stop" if you want to stop processing once the agent buzzes.
	- Click "Run on Tossup Question" to test your agent.
	- Review the answer and click "Buzz Confidence" to see confidence metrics.

	7. Understanding Tossup Results Visualization:

	![Tossup Results Visualization showing a question about Sigmund Freud with a confidence graph](./imgs/tossup-viz.png)

	After running a tossup question, you'll see the results displayed in several ways:
	- Highlighted Question Text:
	- Key terms are highlighted throughout the question, where the agent was evaluated.
	- Click any highlighted word to see an answer popup with the confidence at that point.
	- Buzz point appears in green / red (e.g., "Eckstein's") based on whether the agent was correct or not.
	- Answer Popup:
	- Displays final answer, confidence score, and correctness indicator
	- Appears when hovering over buzzpoints or hovering over highlighted terms

	- Buzz Confidence Graph:
	- X-axis: token position; Y-axis: confidence (0.0-1.0)
	- Blue line shows confidence progression
	- Amber vertical line marks buzz point
	- Dashed horizontal line shows confidence threshold

	This visualization helps evaluate:
	- Most informative clues
	- Confidence threshold calibration
	- Whether agent should be more aggressive or conservative

	8. Exporting Your Pipeline:
	- Once you're satisfied with your agent, click "Export Pipeline" to save your configuration.
	- Click the "Pipeline Preview" dropdown to see the YAML configuration.
	- You can download this configuration for future use or submission.

	## Creating a Bonus Agent

	1. Navigate to the "Bonus Round Agents" tab at the top of the interface.

	2. Creating a Pipeline:
	- Note the input variables `leadin, part` and output variables `answer, confidence, explanation` required for bonus agents.
	- The default setup includes a single agent step labeled "A: Bonus Agent".

	3. Configuring Your Agent:
	- Select your preferred model from the dropdown.
	- Adjust the temperature slider.
	- Customize the system prompt to provide instructions for handling bonus questions.
	- Include clear formatting expectations for answer, confidence, and explanation.

	4. Testing Your Agent:
	- Enter a bonus question with leadin and part text.
	- Click the appropriate run button to test your agent.
	- Review the agent's answer, confidence, and explanation.

	5. Exporting Your Pipeline:
	- Click "Export Pipeline" to save your configuration.

	## Advanced Features

	### Multi-step Pipelines

	1. Adding Steps:
	- Click "+ Add Step" to add additional processing steps.
	- Each step can use different models or system prompts.
	- Use steps for different tasks like analysis, answer generation, and confidence calculation.

	2. Variable Mapping:
	- Connect outputs from earlier steps to inputs for later steps.
	- The final output variables must map to your defined outputs (answer, confidence, etc.).

	### Importing Existing Pipelines

	1. Click the "Select Pipeline to Import..." dropdown.
	2. Choose an existing pipeline to load its configuration.
	3. Click "Import Pipeline" to load it into the interface.
	4. Modify as needed for your specific use case.

	For a detailed example of importing and modifying a sophisticated multi-step pipeline, see our [Advanced Pipeline Examples](./advanced-pipeline-examples.md) guide, which walks through enhancing the two-step justified confidence model.

	## Submitting Your Agent

	1. Evaluate Your Agent:
	- Before submission, click "Evaluate" to run a thorough assessment.
	- This helps identify potential issues before formal submission.

	2. Model Submission:
	- Fill in the "Model Name" and "Description" fields with appropriate information.
	- Click "Sign in with Hugging Face" to authenticate.
	- Click "Submit" to submit your agent for official evaluation.

	## Best Practices

	1. System Prompts:
	- Be specific about output formats in your system prompts.
	- For tossups, instruct the model to provide confidence scores.
	- For bonuses, instruct the model to explain its reasoning.

	2. Confidence Calibration:
	- Fine-tune your buzzer threshold based on testing.
	- Too high: agent might miss answerable questions.
	- Too low: agent might buzz incorrectly.

	3. Testing Thoroughly:
	- Test your agent on various question types and difficulties.
	- Check performance on both early and late clues for tossups.

	Good luck with your quizbowl agent submission!