Quizbowl Agent Web Interface Reference
This guide explains all elements of the web interface for creating and testing quizbowl agents.
Navigation
The interface has four main tabs:
- Tossup Agents: Create and test agents for tossup questions
- Bonus Round Agents: Create and test agents for bonus questions
- Leaderboard: View leaderboard of agents
- Help: Access documentation and support resources
Pipeline Creation Components
Let's walk through the components of the Tossup Agent pipeline creation interface.
Model Step Management
A model step is a single llm call in the pipeline. Your pipeline can have multiple model steps.
- + Add Step: Adds a new step to your pipeline
- Step ID: Unique identifier for each step (A, B, C, etc.)
- Step Name: Descriptive name for the step
- Available when more than one model step:
- Delete Step (×): Removes a step from the pipeline
- Move Up (↑): Moves a step up in the pipeline
- Move Down (↓): Moves a step down in the pipeline
Model Selection
- Model Dropdown: Select language model provider and model
- Temperature Slider: Adjust randomness of outputs (0.0-1.0)
- Lower values (0.1-0.3): More consistent, deterministic outputs
- Higher values (0.7-1.0): More creative, varied outputs
System Prompt
- System Prompt Tab: Contains instructions for the model
- Text Editor: Edit instructions directly, unfocus to apply changes to the system prompt
Input/Output Configuration
Inputs Tab
- Variable Used: Reference name in pipeline (e.g., question_text)
- Input Name: Name the model sees (e.g., question)
- Description: Explains the input's purpose
- + Button: Adds a new input variable
- × Button: Removes an input variable
Outputs Tab
- Output Field: Name of the output variable (e.g., answer)
- Type Dropdown: Data type (str, float, list, bool)
- Description: Explains what the output represents
- Arrow Buttons: Change output order
- + Button: Adds a new output
- × Button: Removes an output
Output Panel
Output Variables
Tossup agents are required to collect the following output variables:
answer
: The answer to the input questionconfidence
: The confidence score of the answer
Buzzer Settings (For Tossup Agents)
- Confidence Threshold: Minimum value of the
confidence
output variable to consider a buzz (0.0-1.0) - Buzz Probability: Minimum value of the normalized probability of the output tokens from the LLM. This is computed using the
logprobs
of the output tokens. $p(y|x) =\text{exp}(\Sigma_{y_i \in y} \text{logprob}(y_i))$. However, only some of the models supportlogprobs
. - Method Dropdown:
- AND: Both conditions must be true to buzz
- OR: Any condition can trigger a buzz
Testing Components
Question Selection
- Question ID: Enter ID to load specific question
- Sample Question: Use provided sample
- Run Button: Process question with current pipeline
Results Visualization
Tossup Visualization
Highlighted Question Text:
- Highlighted tokens are where we probe the model with the input question till this point
- Gray/Green/red highlighting based on whether the model has buzzed, buzzed correctly, or buzzed incorrectly
- Hover for answer/confidence details
Answer Popup:
- Shows final answer
- Displays confidence score
- Indicates correctness
Buzz Confidence Graph:
- X-axis: Token position
- Y-axis: Confidence (0.0-1.0)
- Blue line: Confidence progression
Bonus Visualization
- Question Display: Shows leadin and parts
- Results Table:
- Part number
- Correctness indicator
- Confidence score
- Prediction
- Explanation
Pipeline Management
Import/Export
- Select Pipeline to Import dropdown: Load existing pipeline configuration
- Import Pipeline: Apply selected pipeline configuration
- Export Pipeline: Save configuration as YAML
- Pipeline Preview: View and edit pipeline configuration in YAML format
Evaluation and Submission
- Evaluate: Run comprehensive assessment
- Model Name: Name for submission
- Description: Details about your agent
- Sign in with Hugging Face: Authentication
- Submit: Submit agent for official evaluation
Tips for Effective Use
- Use the system prompt to give clear instructions
- Test different confidence thresholds to find optimal settings
- Monitor buzz positions in the visualization
- Examine confidence trends to identify problem areas
- Use multi-step pipelines for complex tasks