quizbowl-submission / docs /ui-reference.md
Maharshi Gor
Added better documentation
0f6850b
|
raw
history blame
5 kB
# Quizbowl Agent Web Interface Reference
This guide explains all elements of the web interface for creating and testing quizbowl agents.
## Navigation
The interface has four main tabs:
- **Tossup Agents**: Create and test agents for tossup questions
- **Bonus Round Agents**: Create and test agents for bonus questions
- **Leaderboard**: View leaderboard of agents
- **Help**: Access documentation and support resources
## Pipeline Creation Components
Let's walk through the components of the Tossup Agent pipeline creation interface.
![Tossup Agent Pipeline Creation Interface](./imgs/tossup-agent-pipeline.png)
### Model Step Management
A model step is a single llm call in the pipeline. Your pipeline can have multiple model steps.
- **+ Add Step**: Adds a new step to your pipeline
- **Step ID**: Unique identifier for each step (A, B, C, etc.)
- **Step Name**: Descriptive name for the step
- Available when more than one model step:
- **Delete Step** (Γ—): Removes a step from the pipeline
- **Move Up** (↑): Moves a step up in the pipeline
- **Move Down** (↓): Moves a step down in the pipeline
### Model Selection
- **Model Dropdown**: Select language model provider and model
- **Temperature Slider**: Adjust randomness of outputs (0.0-1.0)
- Lower values (0.1-0.3): More consistent, deterministic outputs
- Higher values (0.7-1.0): More creative, varied outputs
### System Prompt
- **System Prompt Tab**: Contains instructions for the model
- **Text Editor**: Edit instructions directly, unfocus to apply changes to the system prompt
### Input/Output Configuration
#### Inputs Tab
![Inputs Tab](./imgs/inputs-tab.png)
- **Variable Used**: Reference name in pipeline (e.g., question_text)
- **Input Name**: Name the model sees (e.g., question)
- **Description**: Explains the input's purpose
- **+ Button**: Adds a new input variable
- **Γ— Button**: Removes an input variable
#### Outputs Tab
![Outputs Tab](./imgs/outputs-tab.png)
- **Output Field**: Name of the output variable (e.g., answer)
- **Type Dropdown**: Data type (str, float, list, bool)
- **Description**: Explains what the output represents
- **Arrow Buttons**: Change output order
- **+ Button**: Adds a new output
- **Γ— Button**: Removes an output
### Output Panel
![Buzzer Settings](./imgs/buzzer-settings.png)
#### Output Variables
Tossup agents are required to collect the following output variables:
- `answer`: The answer to the input question
- `confidence`: The confidence score of the answer
#### Buzzer Settings (For Tossup Agents)
- **Confidence Threshold**: Minimum value of the `confidence` output variable to consider a buzz (0.0-1.0)
- **Buzz Probability**: Minimum value of the normalized probability of the output tokens from the LLM. This is computed using the `logprobs` of the output tokens. $p(y|x) =\text{exp}(\Sigma_{y_i \in y} \text{logprob}(y_i))$. However, only some of the models support `logprobs`.
- **Method Dropdown**:
- AND: Both conditions must be true to buzz
- OR: Any condition can trigger a buzz
## Testing Components
### Question Selection
- **Question ID**: Enter ID to load specific question
- **Sample Question**: Use provided sample
- **Run Button**: Process question with current pipeline
### Results Visualization
#### Tossup Visualization
![Tossup Results](./imgs/tossup-viz.png)
- **Highlighted Question Text**:
- Highlighted tokens are where we probe the model with the input question till this point
- Gray/Green/red highlighting based on whether the model has buzzed, buzzed correctly, or buzzed incorrectly
- Hover for answer/confidence details
- **Answer Popup**:
- Shows final answer
- Displays confidence score
- Indicates correctness
- **Buzz Confidence Graph**:
- X-axis: Token position
- Y-axis: Confidence (0.0-1.0)
- Blue line: Confidence progression
#### Bonus Visualization
- **Question Display**: Shows leadin and parts
- **Results Table**:
- Part number
- Correctness indicator
- Confidence score
- Prediction
- Explanation
## Pipeline Management
### Import/Export
![Import Pipeline](./imgs/import-pipeline.png)
- **Select Pipeline to Import** dropdown: Load existing pipeline configuration
- **Import Pipeline**: Apply selected pipeline configuration
![Export Pipeline](./imgs/pipeline-preview.png)
- **Export Pipeline**: Save configuration as YAML
- **Pipeline Preview**: View and edit pipeline configuration in YAML format
### Evaluation and Submission
- **Evaluate**: Run comprehensive assessment
- **Model Name**: Name for submission
- **Description**: Details about your agent
- **Sign in with Hugging Face**: Authentication
- **Submit**: Submit agent for official evaluation
## Tips for Effective Use
- Use the system prompt to give clear instructions
- Test different confidence thresholds to find optimal settings
- Monitor buzz positions in the visualization
- Examine confidence trends to identify problem areas
- Use multi-step pipelines for complex tasks