quizbowl-submission / docs /goals-and-evaluation.md
Maharshi Gor
Added better documentation
0f6850b
|
raw
history blame
2.08 kB

Quizbowl Agent Goals and Evaluation

Objectives

Tossup Agents

  • Respond to questions with the best guess with calibrated confidence
  • Buzz at the earliest possible moment with sufficient information
  • Avoid incorrect buzzes
  • Maintain consistent performance across topics

Bonus Agents

  • Answer parts correctly with accurate confidence estimation
  • Provide clear explanation of reasoning which will be used by human team members to validate / pick the suggested answer.
  • Adapt to varying difficulty levels (easy, medium, hard)

Performance Metrics

Tossup Metrics

  • Accuracy: Percentage of correct answers
  • Average Buzz Position: How early in the question you buzz (earlier is better)
  • Confidence Calibration: How well confidence score matches actual performance
  • Score: Points earned based on buzz position and correctness

Bonus Metrics

  • Accuracy: Percentage of correct answers across all parts
  • Confidence Calibration: How well confidence score matches actual performance
  • Explanation Quality: Relevance and clarity of reasoning

Evaluating Your Agent

Testing Baseline Performance

  1. Run the default agent configuration
  2. Record metrics (accuracy, confidence, buzz position)
  3. Identify specific weaknesses in performance

Validating Improvements

After each enhancement:

  1. Run the agent on the same development set of questions
  2. Compare metrics to previous version
  3. Check for improvements in weak areas

Final Evaluation Criteria

Your final agent will be evaluated on:

  1. Overall accuracy across diverse questions
  2. Optimal buzz timing (neither too early nor too late)
  3. Confidence threshold calibration
  4. Explanation quality (for bonus agents)