metadata
license: llama3.1
language:
- en
library_name: transformers
tags:
- mergekit
- merge
base_model:
- meta-llama/Meta-Llama-3.1-70B-Instruct
- turboderp/Cat-Llama-3-70B-instruct
- Nexusflow/Athene-70B
Cathallama
Awesome model, my new daily driver. Edit: I am seeing a lot of token generations pointing to unknown unicode addresses that didn't show up during testing for this model, so I have stopped using it and I am working on a new version.
Notable Performance
- 9% overall success rate increase on MMLU-PRO over LLaMA 3.1 70b at Q4_0
- Strong performance in MMLU-PRO categories overall
- Great performance during manual testing
Creation workflow
Models merged
- meta-llama/Meta-Llama-3.1-70B-Instruct
- turboderp/Cat-Llama-3-70B-instruct
- Nexusflow/Athene-70B
flowchart TD
A[Nexusflow_Athene] -->|Merge with| B[Meta-Llama-3.1]
C[turboderp_Cat] -->|Merge with| D[Meta-Llama-3.1]
B -->| | E[Merge]
D -->| | E[Merge]
E[Merge] -->|Result| F[Cathallama]
Testing
Hyperparameters
- Temperature: 0.0 for automated, 0.9 for manual
- Penalize repeat sequence: 1.05
- Consider N tokens for penalize: 256
- Penalize repetition of newlines
- Top-K sampling: 40
- Top-P sampling: 0.95
- Min-P sampling: 0.05
LLaMAcpp Version
- b3527-2-g2d5dd7bb
- -fa -ngl -1 -ctk f16 --no-mmap
Tested Files
- Cathallama-70B.Q4_0.gguf
- Nexusflow_Athene-70B.Q4_0.gguf
- turboderp_Cat-Llama-3-70B-instruct.Q4_0.gguf
- Meta-Llama-3.1-70B-Instruct.Q4_0.gguf
Tests
Manual testing
Category | Test Case | Cathallama-70B.Q4_0.gguf | Nexusflow_Athene-70B.Q4_0.gguf | turboderp_Cat-Llama-3-70B-instruct.Q4_0.gguf | Meta-Llama-3.1-70B-Instruct.Q4_0.gguf |
---|---|---|---|---|---|
Common Sense | Ball on cup | OK | KO | KO | OK |
Big duck small horse | KO | OK | KO | OK | |
Killers | OK | OK | KO | OK | |
Strawberry r's | OK | KO | KO | KO | |
9.11 or 9.9 bigger | KO | OK | OK | KO | |
Dragon or lens | KO | KO | KO | KO | |
Shirts | OK | OK | KO | KO | |
Sisters | OK | KO | KO | KO | |
Jane faster | OK | OK | OK | OK | |
Programming | JSON | OK | OK | OK | OK |
Python snake game | OK | KO | KO | KO | |
Math | Door window combination | OK | OK | KO | KO |
Smoke | Poem | OK | OK | OK | OK |
Story | OK | OK | KO | OK |
Note: See sample_generations.txt on the main folder of the repo for the raw generations.
MMLU-PRO
Model | Success % |
---|---|
Cathallama-70B.Q4_0.gguf | 51.0% |
turboderp_Cat-Llama-3-70B-instruct.Q4_0.gguf | 37.0% |
Nexusflow_Athene-70B.Q4_0.gguf | 41.0% |
Meta-Llama-3.1-70B-Instruct.Q4_0.gguf | 42.0% |
MMLU-PRO category | Cathallama-70B.Q4_0.gguf | Nexusflow_Athene-70B.Q4_0.gguf | turboderp_Cat-Llama-3-70B-instruct.Q4_0.gguf | Meta-Llama-3.1-70B-Instruct.Q4_0.gguf |
---|---|---|---|---|
Business | 50.0% | 45.0% | 20.0% | 40.0% |
Law | 40.0% | 30.0% | 30.0% | 35.0% |
Psychology | 85.0% | 80.0% | 70.0% | 75.0% |
Biology | 80.0% | 70.0% | 85.0% | 80.0% |
Chemistry | 55.0% | 40.0% | 35.0% | 35.0% |
History | 65.0% | 60.0% | 55.0% | 65.0% |
Other | 55.0% | 50.0% | 45.0% | 50.0% |
Health | 75.0% | 40.0% | 60.0% | 65.0% |
Economics | 80.0% | 75.0% | 65.0% | 70.0% |
Math | 45.0% | 35.0% | 15.0% | 40.0% |
Physics | 50.0% | 45.0% | 45.0% | 45.0% |
Computer Science | 60.0% | 55.0% | 55.0% | 60.0% |
Philosophy | 55.0% | 60.0% | 45.0% | 50.0% |
Engineering | 35.0% | 40.0% | 25.0% | 35.0% |
Note: MMLU-PRO Overall tested with 100 questions. Categories testes with 20 questions from each category.
PubmedQA
Model Name | Success% |
---|---|
Cathallama-70B.Q4_0.gguf | 73.00% |
turboderp_Cat-Llama-3-70B-instruct.Q4_0.gguf | 76.00% |
Nexusflow_Athene-70B.Q4_0.gguf | 67.00% |
Meta-Llama-3.1-70B-Instruct.Q4_0.gguf | 72.00% |
Request
If you are hiring in the EU or can sponsor a visa, PM me :D
PS. Thank you mradermacher for the GGUFs!