Quizbowl Agent Web Interface Reference

This guide explains all elements of the web interface for creating and testing quizbowl agents.

Navigation

The interface has four main tabs:

Let's walk through the components of the Tossup Agent pipeline creation interface.

A model step is a single llm call in the pipeline. Your pipeline can have multiple model steps.

+ Add Step: Adds a new step to your pipeline
Step ID: Unique identifier for each step (A, B, C, etc.)
Step Name: Descriptive name for the step
Available when more than one model step:
- Delete Step (×): Removes a step from the pipeline
- Move Up (↑): Moves a step up in the pipeline
- Move Down (↓): Moves a step down in the pipeline

Model Dropdown: Select language model provider and model
Temperature Slider: Adjust randomness of outputs (0.0-1.0)
- Lower values (0.1-0.3): More consistent, deterministic outputs
- Higher values (0.7-1.0): More creative, varied outputs

System Prompt Tab: Contains instructions for the model
Text Editor: Edit instructions directly, unfocus to apply changes to the system prompt

Tossup agents are required to collect the following output variables:

Confidence Threshold: Minimum value of the confidence output variable to consider a buzz (0.0-1.0)
Buzz Probability: Minimum value of the normalized probability of the output tokens from the LLM. This is computed using the logprobs of the output tokens. $p(y|x) =\text{exp}(\Sigma_{y_i \in y} \text{logprob}(y_i))$. However, only some of the models support logprobs.
Method Dropdown:
- AND: Both conditions must be true to buzz
- OR: Any condition can trigger a buzz

Highlighted Question Text:
- Highlighted tokens are where we probe the model with the input question till this point
- Gray/Green/red highlighting based on whether the model has buzzed, buzzed correctly, or buzzed incorrectly
- Hover for answer/confidence details
Answer Popup:
- Shows final answer
- Displays confidence score
- Indicates correctness
Buzz Confidence Graph:
- X-axis: Token position
- Y-axis: Confidence (0.0-1.0)
- Blue line: Confidence progression

Question Display: Shows leadin and parts
Results Table:
- Part number
- Correctness indicator
- Confidence score
- Prediction
- Explanation