Maharshi Gor
commited on
Commit
·
7acf14e
1
Parent(s):
02b7dec
Configure Git LFS for PNG files
Browse files- .gitattributes +1 -0
- docs/advanced-pipeline-examples.md +73 -0
- docs/imgs/buzzer-settings.png +3 -0
- docs/imgs/inputs-tab.png +3 -0
- docs/imgs/outputs-tab.png +3 -0
- docs/imgs/tossup-viz.png +3 -0
- docs/walkthrough.md +169 -0
.gitattributes
CHANGED
@@ -33,3 +33,4 @@ saved_model/**/* filter=lfs diff=lfs merge=lfs -text
|
|
33 |
*.zip filter=lfs diff=lfs merge=lfs -text
|
34 |
*.zst filter=lfs diff=lfs merge=lfs -text
|
35 |
*tfevents* filter=lfs diff=lfs merge=lfs -text
|
|
|
|
33 |
*.zip filter=lfs diff=lfs merge=lfs -text
|
34 |
*.zst filter=lfs diff=lfs merge=lfs -text
|
35 |
*tfevents* filter=lfs diff=lfs merge=lfs -text
|
36 |
+
*.png filter=lfs diff=lfs merge=lfs -text
|
docs/advanced-pipeline-examples.md
ADDED
@@ -0,0 +1,73 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
1 |
+
# Working with Advanced Pipeline Examples
|
2 |
+
|
3 |
+
This guide demonstrates how to load, modify, and run an existing advanced pipeline example, focusing on the two-step justified confidence model for tossup questions.
|
4 |
+
|
5 |
+
## Loading the Two-Step Justified Confidence Example
|
6 |
+
|
7 |
+
1. Navigate to the "Tossup Agents" tab at the top of the interface.
|
8 |
+
|
9 |
+
2. Click the "Select Pipeline to Import..." dropdown and choose "two-step-justified-confidence.yaml".
|
10 |
+
|
11 |
+
3. Click "Import Pipeline" to load the example into the interface.
|
12 |
+
|
13 |
+
## Understanding the Two-Step Pipeline Structure
|
14 |
+
|
15 |
+
The loaded pipeline has two distinct steps:
|
16 |
+
|
17 |
+
1. **Step A: Answer Generator**
|
18 |
+
- Uses OpenAI/gpt-4o-mini
|
19 |
+
- Takes question text as input
|
20 |
+
- Generates an answer candidate
|
21 |
+
- Uses a focused system prompt for answer generation only
|
22 |
+
|
23 |
+
2. **Step B: Confidence Evaluator**
|
24 |
+
- Uses Cohere/command-r-plus
|
25 |
+
- Takes the question text AND the generated answer from Step A
|
26 |
+
- Evaluates confidence and provides justification
|
27 |
+
- Uses a specialized system prompt for confidence evaluation
|
28 |
+
|
29 |
+
This separation of concerns allows each model to focus on a specific task:
|
30 |
+
- The first model concentrates solely on generating the most accurate answer
|
31 |
+
- The second model evaluates how confident we should be in that answer
|
32 |
+
|
33 |
+
## Modifying the Pipeline for Better Performance
|
34 |
+
|
35 |
+
Here are some ways to enhance the pipeline:
|
36 |
+
|
37 |
+
1. **Upgrade the Answer Generator**:
|
38 |
+
- Click on Step A in the interface
|
39 |
+
- Change the model from gpt-4o-mini to a more powerful model like gpt-4o
|
40 |
+
- Modify the system prompt to include more specific instructions about quizbowl answer formatting
|
41 |
+
|
42 |
+
2. **Improve the Confidence Evaluator**:
|
43 |
+
- Click on Step B
|
44 |
+
- Add specific domain knowledge to the system prompt
|
45 |
+
- For example, add: "Consider question length when evaluating confidence. Shorter, incomplete questions with less information revealed typically result in lower confidence scores."
|
46 |
+
- Change the order of input variables so that model produces justification before confidence score, and hence conditions its confidence score on the justification.
|
47 |
+
|
48 |
+
## Running and Testing Your Modified Pipeline
|
49 |
+
|
50 |
+
1. After making your modifications, scroll down to adjust the buzzer settings:
|
51 |
+
- Consider changing the confidence threshold based on the performance of your enhanced model
|
52 |
+
- You might want to lower it slightly if you've improved the confidence evaluator
|
53 |
+
|
54 |
+
2. Test your modified pipeline:
|
55 |
+
- Select a Question ID or use the provided sample question
|
56 |
+
- Click "Run on Tossup Question"
|
57 |
+
- Observe the answer, confidence score, and justification
|
58 |
+
|
59 |
+
3. Check the "Buzz Confidence" chart to see how confidence evolved during question processing
|
60 |
+
|
61 |
+
## Advantages of Multi-Step Pipelines
|
62 |
+
|
63 |
+
Multi-step pipelines offer several benefits:
|
64 |
+
|
65 |
+
1. **Specialized Models**: Use different models for different tasks (e.g., GPT for general knowledge, Claude for reasoning)
|
66 |
+
|
67 |
+
2. **Focused Prompting**: Each step can have a targeted system prompt optimized for its specific task
|
68 |
+
|
69 |
+
3. **Chain of Thought**: Build sophisticated reasoning by connecting steps in a logical sequence
|
70 |
+
|
71 |
+
4. **Better Confidence Calibration**: Dedicated confidence evaluation typically results in more reliable buzzing
|
72 |
+
|
73 |
+
5. **Transparency**: The justification output helps you understand why the model made certain decisions
|
docs/imgs/buzzer-settings.png
ADDED
![]() |
Git LFS Details
|
docs/imgs/inputs-tab.png
ADDED
![]() |
Git LFS Details
|
docs/imgs/outputs-tab.png
ADDED
![]() |
Git LFS Details
|
docs/imgs/tossup-viz.png
ADDED
![]() |
Git LFS Details
|
docs/walkthrough.md
ADDED
@@ -0,0 +1,169 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
1 |
+
# Quizbowl Agent Web Interface Walkthrough
|
2 |
+
|
3 |
+
This walkthrough guide will help you create, test, and submit your quizbowl agent for both tossup and bonus questions using our web interface.
|
4 |
+
|
5 |
+
## Overview
|
6 |
+
|
7 |
+
Our web interface allows you to:
|
8 |
+
- Create and import pipeline workflows
|
9 |
+
- Configure tossup and bonus agents
|
10 |
+
- Test agents on sample questions
|
11 |
+
- Visualize agent performance
|
12 |
+
- Export your pipeline configurations to yaml files.
|
13 |
+
- Submit your agents to our competition for full evaluation.
|
14 |
+
|
15 |
+
## Creating a Tossup Agent
|
16 |
+
|
17 |
+
1. Navigate to the "Tossup Agents" tab at the top of the interface.
|
18 |
+
|
19 |
+
2. **Creating a Pipeline**:
|
20 |
+
- Note the input variable `question_text` and output variables `answer`, `confidence` required for tossup agents.
|
21 |
+
- The default setup includes a single agent step labeled "A: Tossup Agent".
|
22 |
+
- You can add more steps using the "+ Add Step" button for multi-step pipelines.
|
23 |
+
|
24 |
+
3. **Configuring Your Agent**:
|
25 |
+
- Select your preferred model from the dropdown (e.g., `OpenAI/gpt-4o-mini`).
|
26 |
+
- Adjust the temperature slider (higher values = more creativity, lower = more deterministic).
|
27 |
+
- Click on the "System Prompt" tab to customize your agent's instructions.
|
28 |
+
- Your system prompt is crucial - it tells the LLM how to interpret questions and format answers.
|
29 |
+
|
30 |
+
4. **Managing Input and Output Variables**:
|
31 |
+
|
32 |
+

|
33 |
+
- Click the "Inputs" tab to view and modify input variables:
|
34 |
+
- Each input has a "Variable Used" (how it's referenced in the pipeline)
|
35 |
+
- "Input Name" (what the model sees)
|
36 |
+
- "Description" (explains the input's purpose)
|
37 |
+
- Use the "+" button to add a new input variable
|
38 |
+
- Use the "×" button to remove an input variable
|
39 |
+
|
40 |
+

|
41 |
+
- Click the "Outputs" tab to manage output variables:
|
42 |
+
- Each output has an "Output Field" name
|
43 |
+
- "Type" dropdown (str, float, etc.) to define the data type
|
44 |
+
- "Description" explaining the output's purpose
|
45 |
+
- Use arrow buttons to change output order
|
46 |
+
- Use the "+" button to add a new output
|
47 |
+
- Use the "×" button to remove an output
|
48 |
+
|
49 |
+
- When adding a new output, consider the following types:
|
50 |
+
- `str`: For text outputs like answers
|
51 |
+
- `float`: For numerical outputs like confidence scores (0.0-1.0)
|
52 |
+
- `bool`: For true/false outputs
|
53 |
+
- `list`: For array outputs like multiple answer candidates
|
54 |
+
|
55 |
+
5. **Buzzer Settings**:
|
56 |
+
|
57 |
+

|
58 |
+
- Scroll down to the "Buzzer settings" section.
|
59 |
+
- Set your confidence threshold (e.g., 0.85) - your agent will only buzz when its confidence exceeds this value.
|
60 |
+
- Choose a method (AND/OR) if combining multiple conditions.
|
61 |
+
- Adjust probability settings if needed.
|
62 |
+
|
63 |
+
6. **Testing Your Agent**:
|
64 |
+
- Enter a Question ID or use the provided sample question.
|
65 |
+
- Check "Early Stop" if you want to stop processing once the agent buzzes.
|
66 |
+
- Click "Run on Tossup Question" to test your agent.
|
67 |
+
- Review the answer and click "Buzz Confidence" to see confidence metrics.
|
68 |
+
|
69 |
+
7. **Understanding Tossup Results Visualization**:
|
70 |
+
|
71 |
+

|
72 |
+
|
73 |
+
After running a tossup question, you'll see the results displayed in several ways:
|
74 |
+
- **Highlighted Question Text**:
|
75 |
+
- Key terms are highlighted throughout the question, where the agent was evaluated.
|
76 |
+
- Click any highlighted word to see an answer popup with the confidence at that point.
|
77 |
+
- Buzz point appears in green / red (e.g., "Eckstein's") based on whether the agent was correct or not.
|
78 |
+
- **Answer Popup**:
|
79 |
+
- Displays final answer, confidence score, and correctness indicator
|
80 |
+
- Appears when hovering over buzzpoints or hovering over highlighted terms
|
81 |
+
|
82 |
+
- **Buzz Confidence Graph**:
|
83 |
+
- X-axis: token position; Y-axis: confidence (0.0-1.0)
|
84 |
+
- Blue line shows confidence progression
|
85 |
+
- Amber vertical line marks buzz point
|
86 |
+
- Dashed horizontal line shows confidence threshold
|
87 |
+
|
88 |
+
This visualization helps evaluate:
|
89 |
+
- Most informative clues
|
90 |
+
- Confidence threshold calibration
|
91 |
+
- Whether agent should be more aggressive or conservative
|
92 |
+
|
93 |
+
8. **Exporting Your Pipeline**:
|
94 |
+
- Once you're satisfied with your agent, click "Export Pipeline" to save your configuration.
|
95 |
+
- Click the "Pipeline Preview" dropdown to see the YAML configuration.
|
96 |
+
- You can download this configuration for future use or submission.
|
97 |
+
|
98 |
+
## Creating a Bonus Agent
|
99 |
+
|
100 |
+
1. Navigate to the "Bonus Round Agents" tab at the top of the interface.
|
101 |
+
|
102 |
+
2. **Creating a Pipeline**:
|
103 |
+
- Note the input variables `leadin, part` and output variables `answer, confidence, explanation` required for bonus agents.
|
104 |
+
- The default setup includes a single agent step labeled "A: Bonus Agent".
|
105 |
+
|
106 |
+
3. **Configuring Your Agent**:
|
107 |
+
- Select your preferred model from the dropdown.
|
108 |
+
- Adjust the temperature slider.
|
109 |
+
- Customize the system prompt to provide instructions for handling bonus questions.
|
110 |
+
- Include clear formatting expectations for answer, confidence, and explanation.
|
111 |
+
|
112 |
+
4. **Testing Your Agent**:
|
113 |
+
- Enter a bonus question with leadin and part text.
|
114 |
+
- Click the appropriate run button to test your agent.
|
115 |
+
- Review the agent's answer, confidence, and explanation.
|
116 |
+
|
117 |
+
5. **Exporting Your Pipeline**:
|
118 |
+
- Click "Export Pipeline" to save your configuration.
|
119 |
+
|
120 |
+
## Advanced Features
|
121 |
+
|
122 |
+
### Multi-step Pipelines
|
123 |
+
|
124 |
+
1. **Adding Steps**:
|
125 |
+
- Click "+ Add Step" to add additional processing steps.
|
126 |
+
- Each step can use different models or system prompts.
|
127 |
+
- Use steps for different tasks like analysis, answer generation, and confidence calculation.
|
128 |
+
|
129 |
+
2. **Variable Mapping**:
|
130 |
+
- Connect outputs from earlier steps to inputs for later steps.
|
131 |
+
- The final output variables must map to your defined outputs (answer, confidence, etc.).
|
132 |
+
|
133 |
+
### Importing Existing Pipelines
|
134 |
+
|
135 |
+
1. Click the "Select Pipeline to Import..." dropdown.
|
136 |
+
2. Choose an existing pipeline to load its configuration.
|
137 |
+
3. Click "Import Pipeline" to load it into the interface.
|
138 |
+
4. Modify as needed for your specific use case.
|
139 |
+
|
140 |
+
For a detailed example of importing and modifying a sophisticated multi-step pipeline, see our [Advanced Pipeline Examples](./advanced-pipeline-examples.md) guide, which walks through enhancing the two-step justified confidence model.
|
141 |
+
|
142 |
+
## Submitting Your Agent
|
143 |
+
|
144 |
+
1. **Evaluate Your Agent**:
|
145 |
+
- Before submission, click "Evaluate" to run a thorough assessment.
|
146 |
+
- This helps identify potential issues before formal submission.
|
147 |
+
|
148 |
+
2. **Model Submission**:
|
149 |
+
- Fill in the "Model Name" and "Description" fields with appropriate information.
|
150 |
+
- Click "Sign in with Hugging Face" to authenticate.
|
151 |
+
- Click "Submit" to submit your agent for official evaluation.
|
152 |
+
|
153 |
+
## Best Practices
|
154 |
+
|
155 |
+
1. **System Prompts**:
|
156 |
+
- Be specific about output formats in your system prompts.
|
157 |
+
- For tossups, instruct the model to provide confidence scores.
|
158 |
+
- For bonuses, instruct the model to explain its reasoning.
|
159 |
+
|
160 |
+
2. **Confidence Calibration**:
|
161 |
+
- Fine-tune your buzzer threshold based on testing.
|
162 |
+
- Too high: agent might miss answerable questions.
|
163 |
+
- Too low: agent might buzz incorrectly.
|
164 |
+
|
165 |
+
3. **Testing Thoroughly**:
|
166 |
+
- Test your agent on various question types and difficulties.
|
167 |
+
- Check performance on both early and late clues for tossups.
|
168 |
+
|
169 |
+
Good luck with your quizbowl agent submission!
|