Maharshi Gor commited on
Commit
7acf14e
·
1 Parent(s): 02b7dec

Configure Git LFS for PNG files

Browse files
.gitattributes CHANGED
@@ -33,3 +33,4 @@ saved_model/**/* filter=lfs diff=lfs merge=lfs -text
33
  *.zip filter=lfs diff=lfs merge=lfs -text
34
  *.zst filter=lfs diff=lfs merge=lfs -text
35
  *tfevents* filter=lfs diff=lfs merge=lfs -text
 
 
33
  *.zip filter=lfs diff=lfs merge=lfs -text
34
  *.zst filter=lfs diff=lfs merge=lfs -text
35
  *tfevents* filter=lfs diff=lfs merge=lfs -text
36
+ *.png filter=lfs diff=lfs merge=lfs -text
docs/advanced-pipeline-examples.md ADDED
@@ -0,0 +1,73 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ # Working with Advanced Pipeline Examples
2
+
3
+ This guide demonstrates how to load, modify, and run an existing advanced pipeline example, focusing on the two-step justified confidence model for tossup questions.
4
+
5
+ ## Loading the Two-Step Justified Confidence Example
6
+
7
+ 1. Navigate to the "Tossup Agents" tab at the top of the interface.
8
+
9
+ 2. Click the "Select Pipeline to Import..." dropdown and choose "two-step-justified-confidence.yaml".
10
+
11
+ 3. Click "Import Pipeline" to load the example into the interface.
12
+
13
+ ## Understanding the Two-Step Pipeline Structure
14
+
15
+ The loaded pipeline has two distinct steps:
16
+
17
+ 1. **Step A: Answer Generator**
18
+ - Uses OpenAI/gpt-4o-mini
19
+ - Takes question text as input
20
+ - Generates an answer candidate
21
+ - Uses a focused system prompt for answer generation only
22
+
23
+ 2. **Step B: Confidence Evaluator**
24
+ - Uses Cohere/command-r-plus
25
+ - Takes the question text AND the generated answer from Step A
26
+ - Evaluates confidence and provides justification
27
+ - Uses a specialized system prompt for confidence evaluation
28
+
29
+ This separation of concerns allows each model to focus on a specific task:
30
+ - The first model concentrates solely on generating the most accurate answer
31
+ - The second model evaluates how confident we should be in that answer
32
+
33
+ ## Modifying the Pipeline for Better Performance
34
+
35
+ Here are some ways to enhance the pipeline:
36
+
37
+ 1. **Upgrade the Answer Generator**:
38
+ - Click on Step A in the interface
39
+ - Change the model from gpt-4o-mini to a more powerful model like gpt-4o
40
+ - Modify the system prompt to include more specific instructions about quizbowl answer formatting
41
+
42
+ 2. **Improve the Confidence Evaluator**:
43
+ - Click on Step B
44
+ - Add specific domain knowledge to the system prompt
45
+ - For example, add: "Consider question length when evaluating confidence. Shorter, incomplete questions with less information revealed typically result in lower confidence scores."
46
+ - Change the order of input variables so that model produces justification before confidence score, and hence conditions its confidence score on the justification.
47
+
48
+ ## Running and Testing Your Modified Pipeline
49
+
50
+ 1. After making your modifications, scroll down to adjust the buzzer settings:
51
+ - Consider changing the confidence threshold based on the performance of your enhanced model
52
+ - You might want to lower it slightly if you've improved the confidence evaluator
53
+
54
+ 2. Test your modified pipeline:
55
+ - Select a Question ID or use the provided sample question
56
+ - Click "Run on Tossup Question"
57
+ - Observe the answer, confidence score, and justification
58
+
59
+ 3. Check the "Buzz Confidence" chart to see how confidence evolved during question processing
60
+
61
+ ## Advantages of Multi-Step Pipelines
62
+
63
+ Multi-step pipelines offer several benefits:
64
+
65
+ 1. **Specialized Models**: Use different models for different tasks (e.g., GPT for general knowledge, Claude for reasoning)
66
+
67
+ 2. **Focused Prompting**: Each step can have a targeted system prompt optimized for its specific task
68
+
69
+ 3. **Chain of Thought**: Build sophisticated reasoning by connecting steps in a logical sequence
70
+
71
+ 4. **Better Confidence Calibration**: Dedicated confidence evaluation typically results in more reliable buzzing
72
+
73
+ 5. **Transparency**: The justification output helps you understand why the model made certain decisions
docs/imgs/buzzer-settings.png ADDED

Git LFS Details

  • SHA256: 370ed9110cd42aee698e060e65fdad8649d373e78fb423a8962718707ece1b17
  • Pointer size: 131 Bytes
  • Size of remote file: 103 kB
docs/imgs/inputs-tab.png ADDED

Git LFS Details

  • SHA256: 6d40a821ea6ce78756240761a61b23145066d83d2878d75aa15a2b40f68b038b
  • Pointer size: 130 Bytes
  • Size of remote file: 55.2 kB
docs/imgs/outputs-tab.png ADDED

Git LFS Details

  • SHA256: 48bc47f2b4defd58cdda91faf478c9ab7dac9d29cc903f3c7c3389ad83ef7a34
  • Pointer size: 130 Bytes
  • Size of remote file: 87.7 kB
docs/imgs/tossup-viz.png ADDED

Git LFS Details

  • SHA256: ff53dc572479564e4c0f26fb73ce664a85ef49e4f0c3eafcd675a30de944ffe3
  • Pointer size: 131 Bytes
  • Size of remote file: 574 kB
docs/walkthrough.md ADDED
@@ -0,0 +1,169 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ # Quizbowl Agent Web Interface Walkthrough
2
+
3
+ This walkthrough guide will help you create, test, and submit your quizbowl agent for both tossup and bonus questions using our web interface.
4
+
5
+ ## Overview
6
+
7
+ Our web interface allows you to:
8
+ - Create and import pipeline workflows
9
+ - Configure tossup and bonus agents
10
+ - Test agents on sample questions
11
+ - Visualize agent performance
12
+ - Export your pipeline configurations to yaml files.
13
+ - Submit your agents to our competition for full evaluation.
14
+
15
+ ## Creating a Tossup Agent
16
+
17
+ 1. Navigate to the "Tossup Agents" tab at the top of the interface.
18
+
19
+ 2. **Creating a Pipeline**:
20
+ - Note the input variable `question_text` and output variables `answer`, `confidence` required for tossup agents.
21
+ - The default setup includes a single agent step labeled "A: Tossup Agent".
22
+ - You can add more steps using the "+ Add Step" button for multi-step pipelines.
23
+
24
+ 3. **Configuring Your Agent**:
25
+ - Select your preferred model from the dropdown (e.g., `OpenAI/gpt-4o-mini`).
26
+ - Adjust the temperature slider (higher values = more creativity, lower = more deterministic).
27
+ - Click on the "System Prompt" tab to customize your agent's instructions.
28
+ - Your system prompt is crucial - it tells the LLM how to interpret questions and format answers.
29
+
30
+ 4. **Managing Input and Output Variables**:
31
+
32
+ ![Inputs Tab](./imgs/inputs-tab.png)
33
+ - Click the "Inputs" tab to view and modify input variables:
34
+ - Each input has a "Variable Used" (how it's referenced in the pipeline)
35
+ - "Input Name" (what the model sees)
36
+ - "Description" (explains the input's purpose)
37
+ - Use the "+" button to add a new input variable
38
+ - Use the "×" button to remove an input variable
39
+
40
+ ![Outputs Tab](./imgs/outputs-tab.png)
41
+ - Click the "Outputs" tab to manage output variables:
42
+ - Each output has an "Output Field" name
43
+ - "Type" dropdown (str, float, etc.) to define the data type
44
+ - "Description" explaining the output's purpose
45
+ - Use arrow buttons to change output order
46
+ - Use the "+" button to add a new output
47
+ - Use the "×" button to remove an output
48
+
49
+ - When adding a new output, consider the following types:
50
+ - `str`: For text outputs like answers
51
+ - `float`: For numerical outputs like confidence scores (0.0-1.0)
52
+ - `bool`: For true/false outputs
53
+ - `list`: For array outputs like multiple answer candidates
54
+
55
+ 5. **Buzzer Settings**:
56
+
57
+ ![Buzzer Settings](./imgs/buzzer-settings.png)
58
+ - Scroll down to the "Buzzer settings" section.
59
+ - Set your confidence threshold (e.g., 0.85) - your agent will only buzz when its confidence exceeds this value.
60
+ - Choose a method (AND/OR) if combining multiple conditions.
61
+ - Adjust probability settings if needed.
62
+
63
+ 6. **Testing Your Agent**:
64
+ - Enter a Question ID or use the provided sample question.
65
+ - Check "Early Stop" if you want to stop processing once the agent buzzes.
66
+ - Click "Run on Tossup Question" to test your agent.
67
+ - Review the answer and click "Buzz Confidence" to see confidence metrics.
68
+
69
+ 7. **Understanding Tossup Results Visualization**:
70
+
71
+ ![Tossup Results Visualization showing a question about Sigmund Freud with a confidence graph](./imgs/tossup-viz.png)
72
+
73
+ After running a tossup question, you'll see the results displayed in several ways:
74
+ - **Highlighted Question Text**:
75
+ - Key terms are highlighted throughout the question, where the agent was evaluated.
76
+ - Click any highlighted word to see an answer popup with the confidence at that point.
77
+ - Buzz point appears in green / red (e.g., "Eckstein's") based on whether the agent was correct or not.
78
+ - **Answer Popup**:
79
+ - Displays final answer, confidence score, and correctness indicator
80
+ - Appears when hovering over buzzpoints or hovering over highlighted terms
81
+
82
+ - **Buzz Confidence Graph**:
83
+ - X-axis: token position; Y-axis: confidence (0.0-1.0)
84
+ - Blue line shows confidence progression
85
+ - Amber vertical line marks buzz point
86
+ - Dashed horizontal line shows confidence threshold
87
+
88
+ This visualization helps evaluate:
89
+ - Most informative clues
90
+ - Confidence threshold calibration
91
+ - Whether agent should be more aggressive or conservative
92
+
93
+ 8. **Exporting Your Pipeline**:
94
+ - Once you're satisfied with your agent, click "Export Pipeline" to save your configuration.
95
+ - Click the "Pipeline Preview" dropdown to see the YAML configuration.
96
+ - You can download this configuration for future use or submission.
97
+
98
+ ## Creating a Bonus Agent
99
+
100
+ 1. Navigate to the "Bonus Round Agents" tab at the top of the interface.
101
+
102
+ 2. **Creating a Pipeline**:
103
+ - Note the input variables `leadin, part` and output variables `answer, confidence, explanation` required for bonus agents.
104
+ - The default setup includes a single agent step labeled "A: Bonus Agent".
105
+
106
+ 3. **Configuring Your Agent**:
107
+ - Select your preferred model from the dropdown.
108
+ - Adjust the temperature slider.
109
+ - Customize the system prompt to provide instructions for handling bonus questions.
110
+ - Include clear formatting expectations for answer, confidence, and explanation.
111
+
112
+ 4. **Testing Your Agent**:
113
+ - Enter a bonus question with leadin and part text.
114
+ - Click the appropriate run button to test your agent.
115
+ - Review the agent's answer, confidence, and explanation.
116
+
117
+ 5. **Exporting Your Pipeline**:
118
+ - Click "Export Pipeline" to save your configuration.
119
+
120
+ ## Advanced Features
121
+
122
+ ### Multi-step Pipelines
123
+
124
+ 1. **Adding Steps**:
125
+ - Click "+ Add Step" to add additional processing steps.
126
+ - Each step can use different models or system prompts.
127
+ - Use steps for different tasks like analysis, answer generation, and confidence calculation.
128
+
129
+ 2. **Variable Mapping**:
130
+ - Connect outputs from earlier steps to inputs for later steps.
131
+ - The final output variables must map to your defined outputs (answer, confidence, etc.).
132
+
133
+ ### Importing Existing Pipelines
134
+
135
+ 1. Click the "Select Pipeline to Import..." dropdown.
136
+ 2. Choose an existing pipeline to load its configuration.
137
+ 3. Click "Import Pipeline" to load it into the interface.
138
+ 4. Modify as needed for your specific use case.
139
+
140
+ For a detailed example of importing and modifying a sophisticated multi-step pipeline, see our [Advanced Pipeline Examples](./advanced-pipeline-examples.md) guide, which walks through enhancing the two-step justified confidence model.
141
+
142
+ ## Submitting Your Agent
143
+
144
+ 1. **Evaluate Your Agent**:
145
+ - Before submission, click "Evaluate" to run a thorough assessment.
146
+ - This helps identify potential issues before formal submission.
147
+
148
+ 2. **Model Submission**:
149
+ - Fill in the "Model Name" and "Description" fields with appropriate information.
150
+ - Click "Sign in with Hugging Face" to authenticate.
151
+ - Click "Submit" to submit your agent for official evaluation.
152
+
153
+ ## Best Practices
154
+
155
+ 1. **System Prompts**:
156
+ - Be specific about output formats in your system prompts.
157
+ - For tossups, instruct the model to provide confidence scores.
158
+ - For bonuses, instruct the model to explain its reasoning.
159
+
160
+ 2. **Confidence Calibration**:
161
+ - Fine-tune your buzzer threshold based on testing.
162
+ - Too high: agent might miss answerable questions.
163
+ - Too low: agent might buzz incorrectly.
164
+
165
+ 3. **Testing Thoroughly**:
166
+ - Test your agent on various question types and difficulties.
167
+ - Check performance on both early and late clues for tossups.
168
+
169
+ Good luck with your quizbowl agent submission!