metadata

license: llama3.1
language:
  - en
library_name: transformers
tags:
  - mergekit
  - merge
base_model:
  - meta-llama/Meta-Llama-3.1-70B-Instruct
  - turboderp/Cat-Llama-3-70B-instruct
  - Nexusflow/Athene-70B

Cathallama

Awesome model, my new daily driver. Edit: I am seeing a lot of token generations pointing to unknown unicode addresses that didn't show up during testing for this model, so I have stopped using it and I am working on a new version.

Notable Performance

9% overall success rate increase on MMLU-PRO over LLaMA 3.1 70b at Q4_0
Strong performance in MMLU-PRO categories overall
Great performance during manual testing

Creation workflow

Models merged

meta-llama/Meta-Llama-3.1-70B-Instruct
turboderp/Cat-Llama-3-70B-instruct
Nexusflow/Athene-70B

flowchart TD
    A[Nexusflow_Athene] -->|Merge with| B[Meta-Llama-3.1]
    C[turboderp_Cat] -->|Merge with| D[Meta-Llama-3.1]
    B -->| | E[Merge]
    D -->| | E[Merge]
    E[Merge] -->|Result| F[Cathallama]

Testing

Hyperparameters

Temperature: 0.0 for automated, 0.9 for manual
Penalize repeat sequence: 1.05
Consider N tokens for penalize: 256
Penalize repetition of newlines
Top-K sampling: 40
Top-P sampling: 0.95
Min-P sampling: 0.05

LLaMAcpp Version

b3527-2-g2d5dd7bb
-fa -ngl -1 -ctk f16 --no-mmap

Tested Files

Cathallama-70B.Q4_0.gguf
Nexusflow_Athene-70B.Q4_0.gguf
turboderp_Cat-Llama-3-70B-instruct.Q4_0.gguf
Meta-Llama-3.1-70B-Instruct.Q4_0.gguf

Tests

Manual testing

Category	Test Case	Cathallama-70B.Q4_0.gguf	Nexusflow_Athene-70B.Q4_0.gguf	turboderp_Cat-Llama-3-70B-instruct.Q4_0.gguf	Meta-Llama-3.1-70B-Instruct.Q4_0.gguf
Common Sense	Ball on cup	OK	KO	KO	OK
	Big duck small horse	KO	OK	KO	OK
	Killers	OK	OK	KO	OK
	Strawberry r's	OK	KO	KO	KO
	9.11 or 9.9 bigger	KO	OK	OK	KO
	Dragon or lens	KO	KO	KO	KO
	Shirts	OK	OK	KO	KO
	Sisters	OK	KO	KO	KO
	Jane faster	OK	OK	OK	OK
Programming	JSON	OK	OK	OK	OK
	Python snake game	OK	KO	KO	KO
Math	Door window combination	OK	OK	KO	KO
Smoke	Poem	OK	OK	OK	OK
	Story	OK	OK	KO	OK

Note: See sample_generations.txt on the main folder of the repo for the raw generations.

MMLU-PRO

Model	Success %
Cathallama-70B.Q4_0.gguf	51.0%
turboderp_Cat-Llama-3-70B-instruct.Q4_0.gguf	37.0%
Nexusflow_Athene-70B.Q4_0.gguf	41.0%
Meta-Llama-3.1-70B-Instruct.Q4_0.gguf	42.0%

MMLU-PRO category	Cathallama-70B.Q4_0.gguf	Nexusflow_Athene-70B.Q4_0.gguf	turboderp_Cat-Llama-3-70B-instruct.Q4_0.gguf	Meta-Llama-3.1-70B-Instruct.Q4_0.gguf
Business	50.0%	45.0%	20.0%	40.0%
Law	40.0%	30.0%	30.0%	35.0%
Psychology	85.0%	80.0%	70.0%	75.0%
Biology	80.0%	70.0%	85.0%	80.0%
Chemistry	55.0%	40.0%	35.0%	35.0%
History	65.0%	60.0%	55.0%	65.0%
Other	55.0%	50.0%	45.0%	50.0%
Health	75.0%	40.0%	60.0%	65.0%
Economics	80.0%	75.0%	65.0%	70.0%
Math	45.0%	35.0%	15.0%	40.0%
Physics	50.0%	45.0%	45.0%	45.0%
Computer Science	60.0%	55.0%	55.0%	60.0%
Philosophy	55.0%	60.0%	45.0%	50.0%
Engineering	35.0%	40.0%	25.0%	35.0%

Note: MMLU-PRO Overall tested with 100 questions. Categories testes with 20 questions from each category.

PubmedQA

Model Name	Success%
Cathallama-70B.Q4_0.gguf	73.00%
turboderp_Cat-Llama-3-70B-instruct.Q4_0.gguf	76.00%
Nexusflow_Athene-70B.Q4_0.gguf	67.00%
Meta-Llama-3.1-70B-Instruct.Q4_0.gguf	72.00%

Request

If you are hiring in the EU or can sponsor a visa, PM me :D

PS. Thank you mradermacher for the GGUFs!