File size: 5,001 Bytes
0f6850b
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
# Quizbowl Agent Web Interface Reference

This guide explains all elements of the web interface for creating and testing quizbowl agents.

## Navigation

The interface has four main tabs:
- **Tossup Agents**: Create and test agents for tossup questions
- **Bonus Round Agents**: Create and test agents for bonus questions 
- **Leaderboard**: View leaderboard of agents
- **Help**: Access documentation and support resources

## Pipeline Creation Components

Let's walk through the components of the Tossup Agent pipeline creation interface.
![Tossup Agent Pipeline Creation Interface](./imgs/tossup-agent-pipeline.png)

### Model Step Management

A model step is a single llm call in the pipeline. Your pipeline can have multiple model steps.
- **+ Add Step**: Adds a new step to your pipeline
- **Step ID**: Unique identifier for each step (A, B, C, etc.)
- **Step Name**: Descriptive name for the step
- Available when more than one model step:
  - **Delete Step** (×): Removes a step from the pipeline
  - **Move Up** (↑): Moves a step up in the pipeline
  - **Move Down** (↓): Moves a step down in the pipeline

### Model Selection

- **Model Dropdown**: Select language model provider and model
- **Temperature Slider**: Adjust randomness of outputs (0.0-1.0)
  - Lower values (0.1-0.3): More consistent, deterministic outputs
  - Higher values (0.7-1.0): More creative, varied outputs

### System Prompt

- **System Prompt Tab**: Contains instructions for the model
- **Text Editor**: Edit instructions directly, unfocus to apply changes to the system prompt

### Input/Output Configuration

#### Inputs Tab

![Inputs Tab](./imgs/inputs-tab.png)

- **Variable Used**: Reference name in pipeline (e.g., question_text)
- **Input Name**: Name the model sees (e.g., question)
- **Description**: Explains the input's purpose
- **+ Button**: Adds a new input variable
- **× Button**: Removes an input variable

#### Outputs Tab

![Outputs Tab](./imgs/outputs-tab.png)

- **Output Field**: Name of the output variable (e.g., answer)
- **Type Dropdown**: Data type (str, float, list, bool)
- **Description**: Explains what the output represents
- **Arrow Buttons**: Change output order
- **+ Button**: Adds a new output
- **× Button**: Removes an output

### Output Panel

![Buzzer Settings](./imgs/buzzer-settings.png)

#### Output Variables

Tossup agents are required to collect the following output variables:
- `answer`: The answer to the input question
- `confidence`: The confidence score of the answer

#### Buzzer Settings (For Tossup Agents)

- **Confidence Threshold**: Minimum value of the `confidence` output variable to consider a buzz (0.0-1.0)
- **Buzz Probability**: Minimum value of the normalized probability of the output tokens from the LLM. This is computed using the `logprobs` of the output tokens. $p(y|x) =\text{exp}(\Sigma_{y_i \in y} \text{logprob}(y_i))$. However, only some of the models support `logprobs`.
- **Method Dropdown**: 
  - AND: Both conditions must be true to buzz
  - OR: Any condition can trigger a buzz

## Testing Components

### Question Selection

- **Question ID**: Enter ID to load specific question
- **Sample Question**: Use provided sample
- **Run Button**: Process question with current pipeline

### Results Visualization

#### Tossup Visualization

![Tossup Results](./imgs/tossup-viz.png)

- **Highlighted Question Text**:
  - Highlighted tokens are where we probe the model with the input question till this point
  - Gray/Green/red highlighting based on whether the model has buzzed, buzzed correctly, or buzzed incorrectly
  - Hover for answer/confidence details
  
- **Answer Popup**:
  - Shows final answer
  - Displays confidence score
  - Indicates correctness

- **Buzz Confidence Graph**:
  - X-axis: Token position
  - Y-axis: Confidence (0.0-1.0)
  - Blue line: Confidence progression

#### Bonus Visualization

- **Question Display**: Shows leadin and parts
- **Results Table**: 
  - Part number
  - Correctness indicator
  - Confidence score
  - Prediction
  - Explanation

## Pipeline Management

### Import/Export

![Import Pipeline](./imgs/import-pipeline.png)
- **Select Pipeline to Import** dropdown: Load existing pipeline configuration
- **Import Pipeline**: Apply selected pipeline configuration

![Export Pipeline](./imgs/pipeline-preview.png)
- **Export Pipeline**: Save configuration as YAML
- **Pipeline Preview**: View and edit pipeline configuration in YAML format

### Evaluation and Submission

- **Evaluate**: Run comprehensive assessment
- **Model Name**: Name for submission
- **Description**: Details about your agent
- **Sign in with Hugging Face**: Authentication
- **Submit**: Submit agent for official evaluation

## Tips for Effective Use

- Use the system prompt to give clear instructions
- Test different confidence thresholds to find optimal settings
- Monitor buzz positions in the visualization
- Examine confidence trends to identify problem areas
- Use multi-step pipelines for complex tasks