nenad1002
/

quantum-research-bot-v1.0

Text Generation

text-generation-inference

Inference Endpoints

Model card Files Files and versions Community

nenad1002 commited on Sep 2, 2024

Commit

7853a8f

·

verified ·

1 Parent(s): a115162

Update README.md

Files changed (1) hide show

README.md +9 -14

README.md CHANGED Viewed

@@ -127,25 +127,20 @@ After exensive grid search, supervised fine tuning of Llama 3.1-8B with LORA+ re
 ## Evaluation
-<!-- This section describes the evaluation protocols and provides the results. -->
-### Testing Data, Factors & Metrics
-#### Testing Data
-<!-- This should link to a Dataset Card if possible. -->
-[More Information Needed]
-#### Factors
-<!-- These are the things the evaluation is disaggregating by, e.g., subpopulations or domains. -->
-[More Information Needed]
-#### Metrics
-<!-- These are the evaluation metrics being used, ideally with a description of why. -->
 [More Information Needed]

 ## Evaluation
+#### Metrics
+Since the fine-tuned model is designed to summarize newly learned data, ROUGE and BERTScore metrics were measured on a sample of 50 manually crafted questions. The reference answers were constructed during the creation of the training and evaluation sets.
+Given that GPT-4-turbo was already used in this context, I did not compare my model against it. Instead, I chose to compare it against the following models:
+| Metric | quantum-research-bot-v1.0 | Meta-Llama-3.1-8B  | gemini-1.5-pro   |
+|:------------------|:---------------------------|:--------------------|:------------------|
+| **BERTScore F1**     | 0.5821                    | 0.3305             |    0.4982        |
+| **ROUGE-1** | 0.6045       | 0.3152    |0.5029  |
+| **ROUGE-2**|  0.4098          | 0.1751    | 0.3104 |
+| **ROUGE-L**| 0.5809          |  0.2902    | 0.4856  |
 [More Information Needed]