Spaces:

qanta-challenge
/

quizbowl-submission

Running

App Files Files Community

quizbowl-submission / docs /goals-and-evaluation.md

Maharshi Gor

Added better documentation

0f6850b 16 days ago

|

2.08 kB

Quizbowl Agent Goals and Evaluation

Objectives

Tossup Agents

Respond to questions with the best guess with calibrated confidence
Buzz at the earliest possible moment with sufficient information
Avoid incorrect buzzes
Maintain consistent performance across topics

Bonus Agents

Answer parts correctly with accurate confidence estimation
Provide clear explanation of reasoning which will be used by human team members to validate / pick the suggested answer.
Adapt to varying difficulty levels (easy, medium, hard)

Performance Metrics

Tossup Metrics

Accuracy: Percentage of correct answers
Average Buzz Position: How early in the question you buzz (earlier is better)
Confidence Calibration: How well confidence score matches actual performance
Score: Points earned based on buzz position and correctness

Bonus Metrics

Accuracy: Percentage of correct answers across all parts
Confidence Calibration: How well confidence score matches actual performance
Explanation Quality: Relevance and clarity of reasoning

Evaluating Your Agent

Testing Baseline Performance

Run the default agent configuration
Record metrics (accuracy, confidence, buzz position)
Identify specific weaknesses in performance

Validating Improvements

After each enhancement:

Run the agent on the same development set of questions
Compare metrics to previous version
Check for improvements in weak areas

Final Evaluation Criteria

Your final agent will be evaluated on:

Overall accuracy across diverse questions
Optimal buzz timing (neither too early nor too late)
Confidence threshold calibration
Explanation quality (for bonus agents)