sunitha-ravi
commited on
Update README.md
Browse files
README.md
CHANGED
@@ -98,8 +98,8 @@ The model was evaluated on [PatronusAI/HaluBench](https://huggingface.co/dataset
|
|
98 |
|
99 |
| Model | HaluEval | RAGTruth | FinanceBench | DROP | CovidQA | PubmedQA | Overall
|
100 |
| :---: | :---: | :---: | :---: | :---: | :---: | :---: | :---: |
|
101 |
-
| GPT-4o | <ins>87.9%</ins> | 84.3% | 85.3
|
102 |
-
| GPT-4-Turbo | 86.0% | 85.0
|
103 |
| GPT-3.5-Turbo | 62.2% | 50.7% | 60.9% | 57.2% | 56.7% | 62.8% | 58.7% |
|
104 |
| Claude-3.5-Sonnet | 84.5% | 79.1% | 69.3% | 69.7% | 70.8% |84.8% |83.7%|
|
105 |
| RAGAS Faithfulness | 70.6% | 75.8% | 59.5% | 59.6% | 75.0% | 67.7% | 66.9% |
|
@@ -107,7 +107,7 @@ The model was evaluated on [PatronusAI/HaluBench](https://huggingface.co/dataset
|
|
107 |
| Llama-3-Instruct-8B | 83.1% | 80.0% | 55.0% | 58.2% | 75.2% | 70.7% | 70.4% |
|
108 |
| Llama-3-Instruct-70B | 87.0% | **83.8%** | 72.7% | 69.4% | 85.0% | 82.6% | 80.1% |
|
109 |
| Lynx (8B) | 85.7% | 80.0% | 72.5% | **77.8%** | 96.3% | 85.2% | 82.9% |
|
110 |
-
| Lynx v1.1 (8B) | **87.3%** | 79.9% | **75.6%** | 77.5% |
|
111 |
|
112 |
## Citation
|
113 |
If you are using the model, cite using
|
|
|
98 |
|
99 |
| Model | HaluEval | RAGTruth | FinanceBench | DROP | CovidQA | PubmedQA | Overall
|
100 |
| :---: | :---: | :---: | :---: | :---: | :---: | :---: | :---: |
|
101 |
+
| GPT-4o | <ins>87.9%</ins> | 84.3% | <ins>85.3%</ins> | 84.3% | 95.0% | 82.1% | <ins>86.5%</ins> |
|
102 |
+
| GPT-4-Turbo | 86.0% | <ins>85.0%</ins> | 82.2% | <ins>84.8%</ins> | 90.6% | 83.5% | 85.0% |
|
103 |
| GPT-3.5-Turbo | 62.2% | 50.7% | 60.9% | 57.2% | 56.7% | 62.8% | 58.7% |
|
104 |
| Claude-3.5-Sonnet | 84.5% | 79.1% | 69.3% | 69.7% | 70.8% |84.8% |83.7%|
|
105 |
| RAGAS Faithfulness | 70.6% | 75.8% | 59.5% | 59.6% | 75.0% | 67.7% | 66.9% |
|
|
|
107 |
| Llama-3-Instruct-8B | 83.1% | 80.0% | 55.0% | 58.2% | 75.2% | 70.7% | 70.4% |
|
108 |
| Llama-3-Instruct-70B | 87.0% | **83.8%** | 72.7% | 69.4% | 85.0% | 82.6% | 80.1% |
|
109 |
| Lynx (8B) | 85.7% | 80.0% | 72.5% | **77.8%** | 96.3% | 85.2% | 82.9% |
|
110 |
+
| Lynx v1.1 (8B) | **87.3%** | 79.9% | **75.6%** | 77.5% | <ins>**96.9%**</ins> |<ins> **88.9%**</ins> | **84.3%** |
|
111 |
|
112 |
## Citation
|
113 |
If you are using the model, cite using
|