nenad1002 commited on
Commit
7853a8f
1 Parent(s): a115162

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +9 -14
README.md CHANGED
@@ -127,25 +127,20 @@ After exensive grid search, supervised fine tuning of Llama 3.1-8B with LORA+ re
127
 
128
  ## Evaluation
129
 
130
- <!-- This section describes the evaluation protocols and provides the results. -->
131
 
132
- ### Testing Data, Factors & Metrics
133
-
134
- #### Testing Data
135
-
136
- <!-- This should link to a Dataset Card if possible. -->
137
-
138
- [More Information Needed]
139
-
140
- #### Factors
141
 
142
- <!-- These are the things the evaluation is disaggregating by, e.g., subpopulations or domains. -->
 
143
 
144
- [More Information Needed]
 
 
 
 
 
145
 
146
- #### Metrics
147
 
148
- <!-- These are the evaluation metrics being used, ideally with a description of why. -->
149
 
150
  [More Information Needed]
151
 
 
127
 
128
  ## Evaluation
129
 
 
130
 
131
+ #### Metrics
 
 
 
 
 
 
 
 
132
 
133
+ Since the fine-tuned model is designed to summarize newly learned data, ROUGE and BERTScore metrics were measured on a sample of 50 manually crafted questions. The reference answers were constructed during the creation of the training and evaluation sets.
134
+ Given that GPT-4-turbo was already used in this context, I did not compare my model against it. Instead, I chose to compare it against the following models:
135
 
136
+ | Metric | quantum-research-bot-v1.0 | Meta-Llama-3.1-8B | gemini-1.5-pro |
137
+ |:------------------|:---------------------------|:--------------------|:------------------|
138
+ | **BERTScore F1** | 0.5821 | 0.3305 | 0.4982 |
139
+ | **ROUGE-1** | 0.6045 | 0.3152 |0.5029 |
140
+ | **ROUGE-2**| 0.4098 | 0.1751 | 0.3104 |
141
+ | **ROUGE-L**| 0.5809 | 0.2902 | 0.4856 |
142
 
 
143
 
 
144
 
145
  [More Information Needed]
146