Spaces:

qanta-challenge
/

quizbowl-submission

Running

File size: 5,001 Bytes

0f6850b

# Quizbowl Agent Web Interface Reference

This guide explains all elements of the web interface for creating and testing quizbowl agents.

## Navigation

The interface has four main tabs:
- **Tossup Agents**: Create and test agents for tossup questions
- **Bonus Round Agents**: Create and test agents for bonus questions 
- **Leaderboard**: View leaderboard of agents
- **Help**: Access documentation and support resources

## Pipeline Creation Components

Let's walk through the components of the Tossup Agent pipeline creation interface.
![Tossup Agent Pipeline Creation Interface](./imgs/tossup-agent-pipeline.png)

### Model Step Management

A model step is a single llm call in the pipeline. Your pipeline can have multiple model steps.
- **+ Add Step**: Adds a new step to your pipeline
- **Step ID**: Unique identifier for each step (A, B, C, etc.)
- **Step Name**: Descriptive name for the step
- Available when more than one model step:
  - **Delete Step** (×): Removes a step from the pipeline
  - **Move Up** (↑): Moves a step up in the pipeline
  - **Move Down** (↓): Moves a step down in the pipeline

### Model Selection

- **Model Dropdown**: Select language model provider and model
- **Temperature Slider**: Adjust randomness of outputs (0.0-1.0)
  - Lower values (0.1-0.3): More consistent, deterministic outputs
  - Higher values (0.7-1.0): More creative, varied outputs

### System Prompt

- **System Prompt Tab**: Contains instructions for the model
- **Text Editor**: Edit instructions directly, unfocus to apply changes to the system prompt

### Input/Output Configuration

#### Inputs Tab

![Inputs Tab](./imgs/inputs-tab.png)

- **Variable Used**: Reference name in pipeline (e.g., question_text)
- **Input Name**: Name the model sees (e.g., question)
- **Description**: Explains the input's purpose
- **+ Button**: Adds a new input variable
- **× Button**: Removes an input variable

#### Outputs Tab

![Outputs Tab](./imgs/outputs-tab.png)

- **Output Field**: Name of the output variable (e.g., answer)
- **Type Dropdown**: Data type (str, float, list, bool)
- **Description**: Explains what the output represents
- **Arrow Buttons**: Change output order
- **+ Button**: Adds a new output
- **× Button**: Removes an output

### Output Panel

![Buzzer Settings](./imgs/buzzer-settings.png)

#### Output Variables

Tossup agents are required to collect the following output variables:
- `answer`: The answer to the input question
- `confidence`: The confidence score of the answer

#### Buzzer Settings (For Tossup Agents)

- **Confidence Threshold**: Minimum value of the `confidence` output variable to consider a buzz (0.0-1.0)
- **Buzz Probability**: Minimum value of the normalized probability of the output tokens from the LLM. This is computed using the `logprobs` of the output tokens. $p(y|x) =\text{exp}(\Sigma_{y_i \in y} \text{logprob}(y_i))$. However, only some of the models support `logprobs`.
- **Method Dropdown**: 
  - AND: Both conditions must be true to buzz
  - OR: Any condition can trigger a buzz

## Testing Components

### Question Selection

- **Question ID**: Enter ID to load specific question
- **Sample Question**: Use provided sample
- **Run Button**: Process question with current pipeline

### Results Visualization

#### Tossup Visualization

![Tossup Results](./imgs/tossup-viz.png)

- **Highlighted Question Text**:
  - Highlighted tokens are where we probe the model with the input question till this point
  - Gray/Green/red highlighting based on whether the model has buzzed, buzzed correctly, or buzzed incorrectly
  - Hover for answer/confidence details
  
- **Answer Popup**:
  - Shows final answer
  - Displays confidence score
  - Indicates correctness

- **Buzz Confidence Graph**:
  - X-axis: Token position
  - Y-axis: Confidence (0.0-1.0)
  - Blue line: Confidence progression

#### Bonus Visualization

- **Question Display**: Shows leadin and parts
- **Results Table**: 
  - Part number
  - Correctness indicator
  - Confidence score
  - Prediction
  - Explanation

## Pipeline Management

### Import/Export

![Import Pipeline](./imgs/import-pipeline.png)
- **Select Pipeline to Import** dropdown: Load existing pipeline configuration
- **Import Pipeline**: Apply selected pipeline configuration

![Export Pipeline](./imgs/pipeline-preview.png)
- **Export Pipeline**: Save configuration as YAML
- **Pipeline Preview**: View and edit pipeline configuration in YAML format

### Evaluation and Submission

- **Evaluate**: Run comprehensive assessment
- **Model Name**: Name for submission
- **Description**: Details about your agent
- **Sign in with Hugging Face**: Authentication
- **Submit**: Submit agent for official evaluation

## Tips for Effective Use

- Use the system prompt to give clear instructions
- Test different confidence thresholds to find optimal settings
- Monitor buzz positions in the visualization
- Examine confidence trends to identify problem areas
- Use multi-step pipelines for complex tasks