Quizbowl Agent Goals and Evaluation
Objectives
Tossup Agents
- Respond to questions with the best guess with calibrated confidence
- Buzz at the earliest possible moment with sufficient information
- Avoid incorrect buzzes
- Maintain consistent performance across topics
Bonus Agents
- Answer parts correctly with accurate confidence estimation
- Provide clear explanation of reasoning which will be used by human team members to validate / pick the suggested answer.
- Adapt to varying difficulty levels (easy, medium, hard)
Performance Metrics
Tossup Metrics
- Accuracy: Percentage of correct answers
- Average Buzz Position: How early in the question you buzz (earlier is better)
- Confidence Calibration: How well confidence score matches actual performance
- Score: Points earned based on buzz position and correctness
Bonus Metrics
- Accuracy: Percentage of correct answers across all parts
- Confidence Calibration: How well confidence score matches actual performance
- Explanation Quality: Relevance and clarity of reasoning
Evaluating Your Agent
Testing Baseline Performance
- Run the default agent configuration
- Record metrics (accuracy, confidence, buzz position)
- Identify specific weaknesses in performance
Validating Improvements
After each enhancement:
- Run the agent on the same development set of questions
- Compare metrics to previous version
- Check for improvements in weak areas
Final Evaluation Criteria
Your final agent will be evaluated on:
- Overall accuracy across diverse questions
- Optimal buzz timing (neither too early nor too late)
- Confidence threshold calibration
- Explanation quality (for bonus agents)