sunitha-ravi commited on
Commit
503b9b9
·
verified ·
1 Parent(s): 71c5d75

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +3 -3
README.md CHANGED
@@ -100,8 +100,8 @@ It outperforms GPT-3.5-Turbo, GPT-4-Turbo, GPT-4o and Claude-3-Sonnet on HaluEva
100
 
101
  | Model | HaluEval | RAGTruth | FinanceBench | DROP | CovidQA | PubmedQA | Overall
102
  | :---: | :---: | :---: | :---: | :---: | :---: | :---: | :---: |
103
- | GPT-4o | 87.9% | 84.3% | 85.3% | 84.3% | 95.0% | 82.1% | 86.5% |
104
- | GPT-4-Turbo | 86.0% | 85.0% | 82.2% | 84.8% | 90.6% | 83.5% | 85.0% |
105
  | GPT-3.5-Turbo | 62.2% | 50.7% | 60.9% | 57.2% | 56.7% | 62.8% | 58.7% |
106
  | Claude-3-Sonnet | 84.5% | 79.1% | 69.7% | 84.3% | 95.0% | 82.9% | 78.8% |
107
  | Claude-3-Haiku | 68.9% | 78.9% | 58.4% | 84.3% | 95.0% | 82.9% | 69.0% |
@@ -110,7 +110,7 @@ It outperforms GPT-3.5-Turbo, GPT-4-Turbo, GPT-4o and Claude-3-Sonnet on HaluEva
110
  | Llama-3-Instruct-8B | 83.1% | 80.0% | 55.0% | 58.2% | 75.2% | 70.7% | 70.4% |
111
  | Llama-3-Instruct-70B | 87.0% | 83.8% | 72.7% | 69.4% | 85.0% | 82.6% | 80.1% |
112
  | LYNX (8B) | 85.7% | 80.0% | 72.5% | 77.8% | 96.3% | 85.2% | 82.9% |
113
- | LYNX (70B) | 88.4% | 80.2% | 81.4% | 86.4% | 97.5% | 90.4% | 87.4% |
114
 
115
 
116
  ## Citation
 
100
 
101
  | Model | HaluEval | RAGTruth | FinanceBench | DROP | CovidQA | PubmedQA | Overall
102
  | :---: | :---: | :---: | :---: | :---: | :---: | :---: | :---: |
103
+ | GPT-4o | 87.9% | 84.3% | **85.3%** | 84.3% | 95.0% | 82.1% | 86.5% |
104
+ | GPT-4-Turbo | 86.0% | **85.0%** | 82.2% | 84.8% | 90.6% | 83.5% | 85.0% |
105
  | GPT-3.5-Turbo | 62.2% | 50.7% | 60.9% | 57.2% | 56.7% | 62.8% | 58.7% |
106
  | Claude-3-Sonnet | 84.5% | 79.1% | 69.7% | 84.3% | 95.0% | 82.9% | 78.8% |
107
  | Claude-3-Haiku | 68.9% | 78.9% | 58.4% | 84.3% | 95.0% | 82.9% | 69.0% |
 
110
  | Llama-3-Instruct-8B | 83.1% | 80.0% | 55.0% | 58.2% | 75.2% | 70.7% | 70.4% |
111
  | Llama-3-Instruct-70B | 87.0% | 83.8% | 72.7% | 69.4% | 85.0% | 82.6% | 80.1% |
112
  | LYNX (8B) | 85.7% | 80.0% | 72.5% | 77.8% | 96.3% | 85.2% | 82.9% |
113
+ | LYNX (70B) | **88.4%** | 80.2% | 81.4% | **86.4%** | **97.5%** | **90.4%** | **87.4%** |
114
 
115
 
116
  ## Citation