File size: 5,001 Bytes
0f6850b |
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 |
# Quizbowl Agent Web Interface Reference
This guide explains all elements of the web interface for creating and testing quizbowl agents.
## Navigation
The interface has four main tabs:
- **Tossup Agents**: Create and test agents for tossup questions
- **Bonus Round Agents**: Create and test agents for bonus questions
- **Leaderboard**: View leaderboard of agents
- **Help**: Access documentation and support resources
## Pipeline Creation Components
Let's walk through the components of the Tossup Agent pipeline creation interface.

### Model Step Management
A model step is a single llm call in the pipeline. Your pipeline can have multiple model steps.
- **+ Add Step**: Adds a new step to your pipeline
- **Step ID**: Unique identifier for each step (A, B, C, etc.)
- **Step Name**: Descriptive name for the step
- Available when more than one model step:
- **Delete Step** (×): Removes a step from the pipeline
- **Move Up** (↑): Moves a step up in the pipeline
- **Move Down** (↓): Moves a step down in the pipeline
### Model Selection
- **Model Dropdown**: Select language model provider and model
- **Temperature Slider**: Adjust randomness of outputs (0.0-1.0)
- Lower values (0.1-0.3): More consistent, deterministic outputs
- Higher values (0.7-1.0): More creative, varied outputs
### System Prompt
- **System Prompt Tab**: Contains instructions for the model
- **Text Editor**: Edit instructions directly, unfocus to apply changes to the system prompt
### Input/Output Configuration
#### Inputs Tab

- **Variable Used**: Reference name in pipeline (e.g., question_text)
- **Input Name**: Name the model sees (e.g., question)
- **Description**: Explains the input's purpose
- **+ Button**: Adds a new input variable
- **× Button**: Removes an input variable
#### Outputs Tab

- **Output Field**: Name of the output variable (e.g., answer)
- **Type Dropdown**: Data type (str, float, list, bool)
- **Description**: Explains what the output represents
- **Arrow Buttons**: Change output order
- **+ Button**: Adds a new output
- **× Button**: Removes an output
### Output Panel

#### Output Variables
Tossup agents are required to collect the following output variables:
- `answer`: The answer to the input question
- `confidence`: The confidence score of the answer
#### Buzzer Settings (For Tossup Agents)
- **Confidence Threshold**: Minimum value of the `confidence` output variable to consider a buzz (0.0-1.0)
- **Buzz Probability**: Minimum value of the normalized probability of the output tokens from the LLM. This is computed using the `logprobs` of the output tokens. $p(y|x) =\text{exp}(\Sigma_{y_i \in y} \text{logprob}(y_i))$. However, only some of the models support `logprobs`.
- **Method Dropdown**:
- AND: Both conditions must be true to buzz
- OR: Any condition can trigger a buzz
## Testing Components
### Question Selection
- **Question ID**: Enter ID to load specific question
- **Sample Question**: Use provided sample
- **Run Button**: Process question with current pipeline
### Results Visualization
#### Tossup Visualization

- **Highlighted Question Text**:
- Highlighted tokens are where we probe the model with the input question till this point
- Gray/Green/red highlighting based on whether the model has buzzed, buzzed correctly, or buzzed incorrectly
- Hover for answer/confidence details
- **Answer Popup**:
- Shows final answer
- Displays confidence score
- Indicates correctness
- **Buzz Confidence Graph**:
- X-axis: Token position
- Y-axis: Confidence (0.0-1.0)
- Blue line: Confidence progression
#### Bonus Visualization
- **Question Display**: Shows leadin and parts
- **Results Table**:
- Part number
- Correctness indicator
- Confidence score
- Prediction
- Explanation
## Pipeline Management
### Import/Export

- **Select Pipeline to Import** dropdown: Load existing pipeline configuration
- **Import Pipeline**: Apply selected pipeline configuration

- **Export Pipeline**: Save configuration as YAML
- **Pipeline Preview**: View and edit pipeline configuration in YAML format
### Evaluation and Submission
- **Evaluate**: Run comprehensive assessment
- **Model Name**: Name for submission
- **Description**: Details about your agent
- **Sign in with Hugging Face**: Authentication
- **Submit**: Submit agent for official evaluation
## Tips for Effective Use
- Use the system prompt to give clear instructions
- Test different confidence thresholds to find optimal settings
- Monitor buzz positions in the visualization
- Examine confidence trends to identify problem areas
- Use multi-step pipelines for complex tasks |