nenad1002 commited on
Commit
e8dc32f
1 Parent(s): 6cdc647

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +3 -1
README.md CHANGED
@@ -127,12 +127,14 @@ The final evaluation cross-entropy ended around 0.4.
127
  Since the fine-tuned model is designed to explain, and if possible, summarize newly learned data, ROUGE and BERTScore metrics were measured on a sample of 50 manually crafted questions. The reference answers were constructed during the creation of the training and evaluation sets.
128
  Given that GPT-4-turbo was already used in this context for the reference questions generation, I did not compare my model against it. Instead, I chose to compare it against the following models:
129
 
130
- | Metric (mean) | quantum-research-bot-v1.0 | Meta-Llama-3.1-8B-Instruct | gemini-1.5-pro |
131
  |:------------------|:---------------------------|:--------------------|:------------------|
132
  | **BERTScore F1** | 0.5821 | 0.3305 | 0.4982 |
133
  | **ROUGE-1** | 0.6045 | 0.3152 |0.5029 |
134
  | **ROUGE-2**| 0.4098 | 0.1751 | 0.3104 |
135
  | **ROUGE-L**| 0.5809 | 0.2902 | 0.4856 |
 
 
136
 
137
  _quantum-research-bot-v1.0_ outperformed on all metrics, although _Gemini_ came close in BERTScore precision with the difference of only 0.001. The Gemini model is able to recognize subtle differences in the input better, but lacks the latest knowledge, making it perform worse in general.
138
 
 
127
  Since the fine-tuned model is designed to explain, and if possible, summarize newly learned data, ROUGE and BERTScore metrics were measured on a sample of 50 manually crafted questions. The reference answers were constructed during the creation of the training and evaluation sets.
128
  Given that GPT-4-turbo was already used in this context for the reference questions generation, I did not compare my model against it. Instead, I chose to compare it against the following models:
129
 
130
+ | Metric (mean/avg) | quantum-research-bot-v1.0 | Meta-Llama-3.1-8B-Instruct | gemini-1.5-pro |
131
  |:------------------|:---------------------------|:--------------------|:------------------|
132
  | **BERTScore F1** | 0.5821 | 0.3305 | 0.4982 |
133
  | **ROUGE-1** | 0.6045 | 0.3152 |0.5029 |
134
  | **ROUGE-2**| 0.4098 | 0.1751 | 0.3104 |
135
  | **ROUGE-L**| 0.5809 | 0.2902 | 0.4856 |
136
+ | **BLEU**| 0.2538 | 0.0736 | 0.1753 |
137
+
138
 
139
  _quantum-research-bot-v1.0_ outperformed on all metrics, although _Gemini_ came close in BERTScore precision with the difference of only 0.001. The Gemini model is able to recognize subtle differences in the input better, but lacks the latest knowledge, making it perform worse in general.
140