image/png

Cathallama

Awesome model, my new daily driver.

Edit: I am seeing a lot of token generations pointing to unknown unicode addresses that didn't show up during testing for this model, so I have stopped using it and I am working on a new version.

Notable Performance

  • 9% overall success rate increase on MMLU-PRO over LLaMA 3.1 70b at Q4_0
  • Strong performance in MMLU-PRO categories overall
  • Great performance during manual testing

Creation workflow

Models merged

  • meta-llama/Meta-Llama-3.1-70B-Instruct
  • turboderp/Cat-Llama-3-70B-instruct
  • Nexusflow/Athene-70B
flowchart TD
    A[Nexusflow_Athene] -->|Merge with| B[Meta-Llama-3.1]
    C[turboderp_Cat] -->|Merge with| D[Meta-Llama-3.1]
    B -->| | E[Merge]
    D -->| | E[Merge]
    E[Merge] -->|Result| F[Cathallama]

image/png

Testing

Hyperparameters

  • Temperature: 0.0 for automated, 0.9 for manual
  • Penalize repeat sequence: 1.05
  • Consider N tokens for penalize: 256
  • Penalize repetition of newlines
  • Top-K sampling: 40
  • Top-P sampling: 0.95
  • Min-P sampling: 0.05

LLaMAcpp Version

  • b3527-2-g2d5dd7bb
  • -fa -ngl -1 -ctk f16 --no-mmap

Tested Files

  • Cathallama-70B.Q4_0.gguf
  • Nexusflow_Athene-70B.Q4_0.gguf
  • turboderp_Cat-Llama-3-70B-instruct.Q4_0.gguf
  • Meta-Llama-3.1-70B-Instruct.Q4_0.gguf

Tests

Manual testing

Category Test Case Cathallama-70B.Q4_0.gguf Nexusflow_Athene-70B.Q4_0.gguf turboderp_Cat-Llama-3-70B-instruct.Q4_0.gguf Meta-Llama-3.1-70B-Instruct.Q4_0.gguf
Common Sense Ball on cup OK KO KO OK
Big duck small horse KO OK KO OK
Killers OK OK KO OK
Strawberry r's OK KO KO KO
9.11 or 9.9 bigger KO OK OK KO
Dragon or lens KO KO KO KO
Shirts OK OK KO KO
Sisters OK KO KO KO
Jane faster OK OK OK OK
Programming JSON OK OK OK OK
Python snake game OK KO KO KO
Math Door window combination OK OK KO KO
Smoke Poem OK OK OK OK
Story OK OK KO OK

Note: See sample_generations.txt on the main folder of the repo for the raw generations.

MMLU-PRO

Model Success %
Cathallama-70B.Q4_0.gguf 51.0%
turboderp_Cat-Llama-3-70B-instruct.Q4_0.gguf 37.0%
Nexusflow_Athene-70B.Q4_0.gguf 41.0%
Meta-Llama-3.1-70B-Instruct.Q4_0.gguf 42.0%
MMLU-PRO category Cathallama-70B.Q4_0.gguf Nexusflow_Athene-70B.Q4_0.gguf turboderp_Cat-Llama-3-70B-instruct.Q4_0.gguf Meta-Llama-3.1-70B-Instruct.Q4_0.gguf
Business 50.0% 45.0% 20.0% 40.0%
Law 40.0% 30.0% 30.0% 35.0%
Psychology 85.0% 80.0% 70.0% 75.0%
Biology 80.0% 70.0% 85.0% 80.0%
Chemistry 55.0% 40.0% 35.0% 35.0%
History 65.0% 60.0% 55.0% 65.0%
Other 55.0% 50.0% 45.0% 50.0%
Health 75.0% 40.0% 60.0% 65.0%
Economics 80.0% 75.0% 65.0% 70.0%
Math 45.0% 35.0% 15.0% 40.0%
Physics 50.0% 45.0% 45.0% 45.0%
Computer Science 60.0% 55.0% 55.0% 60.0%
Philosophy 55.0% 60.0% 45.0% 50.0%
Engineering 35.0% 40.0% 25.0% 35.0%

Note: MMLU-PRO Overall tested with 100 questions. Categories testes with 20 questions from each category.

PubmedQA

Model Name Success%
Cathallama-70B.Q4_0.gguf 73.00%
turboderp_Cat-Llama-3-70B-instruct.Q4_0.gguf 76.00%
Nexusflow_Athene-70B.Q4_0.gguf 67.00%
Meta-Llama-3.1-70B-Instruct.Q4_0.gguf 72.00%

Request

If you are hiring in the EU or can sponsor a visa, PM me :D

PS. Thank you mradermacher for the GGUFs!

Downloads last month
12
Safetensors
Model size
70.6B params
Tensor type
BF16
·
Inference Examples
This model does not have enough activity to be deployed to Inference API (serverless) yet. Increase its social visibility and check back later, or deploy to Inference Endpoints (dedicated) instead.

Model tree for gbueno86/Cathallama-70B