quizbowl-submission / docs /ui-reference.md
Maharshi Gor
Added better documentation
0f6850b
|
raw
history blame
5 kB

Quizbowl Agent Web Interface Reference

This guide explains all elements of the web interface for creating and testing quizbowl agents.

Navigation

The interface has four main tabs:

  • Tossup Agents: Create and test agents for tossup questions
  • Bonus Round Agents: Create and test agents for bonus questions
  • Leaderboard: View leaderboard of agents
  • Help: Access documentation and support resources

Pipeline Creation Components

Let's walk through the components of the Tossup Agent pipeline creation interface. Tossup Agent Pipeline Creation Interface

Model Step Management

A model step is a single llm call in the pipeline. Your pipeline can have multiple model steps.

  • + Add Step: Adds a new step to your pipeline
  • Step ID: Unique identifier for each step (A, B, C, etc.)
  • Step Name: Descriptive name for the step
  • Available when more than one model step:
    • Delete Step (×): Removes a step from the pipeline
    • Move Up (↑): Moves a step up in the pipeline
    • Move Down (↓): Moves a step down in the pipeline

Model Selection

  • Model Dropdown: Select language model provider and model
  • Temperature Slider: Adjust randomness of outputs (0.0-1.0)
    • Lower values (0.1-0.3): More consistent, deterministic outputs
    • Higher values (0.7-1.0): More creative, varied outputs

System Prompt

  • System Prompt Tab: Contains instructions for the model
  • Text Editor: Edit instructions directly, unfocus to apply changes to the system prompt

Input/Output Configuration

Inputs Tab

Inputs Tab

  • Variable Used: Reference name in pipeline (e.g., question_text)
  • Input Name: Name the model sees (e.g., question)
  • Description: Explains the input's purpose
  • + Button: Adds a new input variable
  • × Button: Removes an input variable

Outputs Tab

Outputs Tab

  • Output Field: Name of the output variable (e.g., answer)
  • Type Dropdown: Data type (str, float, list, bool)
  • Description: Explains what the output represents
  • Arrow Buttons: Change output order
  • + Button: Adds a new output
  • × Button: Removes an output

Output Panel

Buzzer Settings

Output Variables

Tossup agents are required to collect the following output variables:

  • answer: The answer to the input question
  • confidence: The confidence score of the answer

Buzzer Settings (For Tossup Agents)

  • Confidence Threshold: Minimum value of the confidence output variable to consider a buzz (0.0-1.0)
  • Buzz Probability: Minimum value of the normalized probability of the output tokens from the LLM. This is computed using the logprobs of the output tokens. $p(y|x) =\text{exp}(\Sigma_{y_i \in y} \text{logprob}(y_i))$. However, only some of the models support logprobs.
  • Method Dropdown:
    • AND: Both conditions must be true to buzz
    • OR: Any condition can trigger a buzz

Testing Components

Question Selection

  • Question ID: Enter ID to load specific question
  • Sample Question: Use provided sample
  • Run Button: Process question with current pipeline

Results Visualization

Tossup Visualization

Tossup Results

  • Highlighted Question Text:

    • Highlighted tokens are where we probe the model with the input question till this point
    • Gray/Green/red highlighting based on whether the model has buzzed, buzzed correctly, or buzzed incorrectly
    • Hover for answer/confidence details
  • Answer Popup:

    • Shows final answer
    • Displays confidence score
    • Indicates correctness
  • Buzz Confidence Graph:

    • X-axis: Token position
    • Y-axis: Confidence (0.0-1.0)
    • Blue line: Confidence progression

Bonus Visualization

  • Question Display: Shows leadin and parts
  • Results Table:
    • Part number
    • Correctness indicator
    • Confidence score
    • Prediction
    • Explanation

Pipeline Management

Import/Export

Import Pipeline

  • Select Pipeline to Import dropdown: Load existing pipeline configuration
  • Import Pipeline: Apply selected pipeline configuration

Export Pipeline

  • Export Pipeline: Save configuration as YAML
  • Pipeline Preview: View and edit pipeline configuration in YAML format

Evaluation and Submission

  • Evaluate: Run comprehensive assessment
  • Model Name: Name for submission
  • Description: Details about your agent
  • Sign in with Hugging Face: Authentication
  • Submit: Submit agent for official evaluation

Tips for Effective Use

  • Use the system prompt to give clear instructions
  • Test different confidence thresholds to find optimal settings
  • Monitor buzz positions in the visualization
  • Examine confidence trends to identify problem areas
  • Use multi-step pipelines for complex tasks