Update README.md
Browse files
README.md
CHANGED
@@ -127,12 +127,14 @@ The final evaluation cross-entropy ended around 0.4.
|
|
127 |
Since the fine-tuned model is designed to explain, and if possible, summarize newly learned data, ROUGE and BERTScore metrics were measured on a sample of 50 manually crafted questions. The reference answers were constructed during the creation of the training and evaluation sets.
|
128 |
Given that GPT-4-turbo was already used in this context for the reference questions generation, I did not compare my model against it. Instead, I chose to compare it against the following models:
|
129 |
|
130 |
-
| Metric (mean) | quantum-research-bot-v1.0 | Meta-Llama-3.1-8B-Instruct | gemini-1.5-pro |
|
131 |
|:------------------|:---------------------------|:--------------------|:------------------|
|
132 |
| **BERTScore F1** | 0.5821 | 0.3305 | 0.4982 |
|
133 |
| **ROUGE-1** | 0.6045 | 0.3152 |0.5029 |
|
134 |
| **ROUGE-2**| 0.4098 | 0.1751 | 0.3104 |
|
135 |
| **ROUGE-L**| 0.5809 | 0.2902 | 0.4856 |
|
|
|
|
|
136 |
|
137 |
_quantum-research-bot-v1.0_ outperformed on all metrics, although _Gemini_ came close in BERTScore precision with the difference of only 0.001. The Gemini model is able to recognize subtle differences in the input better, but lacks the latest knowledge, making it perform worse in general.
|
138 |
|
|
|
127 |
Since the fine-tuned model is designed to explain, and if possible, summarize newly learned data, ROUGE and BERTScore metrics were measured on a sample of 50 manually crafted questions. The reference answers were constructed during the creation of the training and evaluation sets.
|
128 |
Given that GPT-4-turbo was already used in this context for the reference questions generation, I did not compare my model against it. Instead, I chose to compare it against the following models:
|
129 |
|
130 |
+
| Metric (mean/avg) | quantum-research-bot-v1.0 | Meta-Llama-3.1-8B-Instruct | gemini-1.5-pro |
|
131 |
|:------------------|:---------------------------|:--------------------|:------------------|
|
132 |
| **BERTScore F1** | 0.5821 | 0.3305 | 0.4982 |
|
133 |
| **ROUGE-1** | 0.6045 | 0.3152 |0.5029 |
|
134 |
| **ROUGE-2**| 0.4098 | 0.1751 | 0.3104 |
|
135 |
| **ROUGE-L**| 0.5809 | 0.2902 | 0.4856 |
|
136 |
+
| **BLEU**| 0.2538 | 0.0736 | 0.1753 |
|
137 |
+
|
138 |
|
139 |
_quantum-research-bot-v1.0_ outperformed on all metrics, although _Gemini_ came close in BERTScore precision with the difference of only 0.001. The Gemini model is able to recognize subtle differences in the input better, but lacks the latest knowledge, making it perform worse in general.
|
140 |
|