File size: 2,079 Bytes
0f6850b
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
# Quizbowl Agent Goals and Evaluation

## Objectives

### Tossup Agents
- Respond to questions with the best guess with calibrated confidence
- Buzz at the earliest possible moment with sufficient information
- Avoid incorrect buzzes
- Maintain consistent performance across topics

### Bonus Agents
- Answer parts correctly with accurate confidence estimation
- Provide clear explanation of reasoning which will be used by human team members to validate / pick the suggested answer.
- Adapt to varying difficulty levels (easy, medium, hard)

## Performance Metrics

### Tossup Metrics
- **Accuracy**: Percentage of correct answers
- **Average Buzz Position**: How early in the question you buzz (earlier is better)
- **Confidence Calibration**: How well confidence score matches actual performance
- **Score**: Points earned based on buzz position and correctness

### Bonus Metrics
- **Accuracy**: Percentage of correct answers across all parts
- **Confidence Calibration**: How well confidence score matches actual performance
- **Explanation Quality**: Relevance and clarity of reasoning

## Evaluating Your Agent

### Testing Baseline Performance
1. Run the default agent configuration
2. Record metrics (accuracy, confidence, buzz position)
3. Identify specific weaknesses in performance

### Validating Improvements
After each enhancement:
1. Run the agent on the same development set of questions
2. Compare metrics to previous version
3. Check for improvements in weak areas

### Final Evaluation Criteria
Your final agent will be evaluated on:
1. Overall accuracy across diverse questions
2. Optimal buzz timing (neither too early nor too late)
3. Confidence threshold calibration
4. Explanation quality (for bonus agents)

<!-- ## Setting Goals for Your Agent

### Minimum Goals
- Accuracy above 60%
- Appropriate confidence threshold (0.7-0.9)
- Reasonable buzz positions

### Advanced Goals
- Multi-step pipelines with specialized components
- Accuracy above 85%
- Strategic early buzzing on familiar topics
- Detailed, accurate explanations for bonus questions  -->