metadata

language:
  - en
license: mit
tags:
  - chatml
  - mistral
  - instruct
  - openhermes
  - economics
datasets:
  - rxavier/economicus
base_model: teknium/OpenHermes-2.5-Mistral-7B
model-index:
  - name: Taurus-7B-1.0
    results:
      - task:
          type: text-generation
          name: Text Generation
        dataset:
          name: AI2 Reasoning Challenge (25-Shot)
          type: ai2_arc
          config: ARC-Challenge
          split: test
          args:
            num_few_shot: 25
        metrics:
          - type: acc_norm
            value: 63.57
            name: normalized accuracy
        source:
          url: >-
            https://huggingface.co/spaces/HuggingFaceH4/open_llm_leaderboard?query=rxavier/Taurus-7B-1.0
          name: Open LLM Leaderboard
      - task:
          type: text-generation
          name: Text Generation
        dataset:
          name: HellaSwag (10-Shot)
          type: hellaswag
          split: validation
          args:
            num_few_shot: 10
        metrics:
          - type: acc_norm
            value: 83.64
            name: normalized accuracy
        source:
          url: >-
            https://huggingface.co/spaces/HuggingFaceH4/open_llm_leaderboard?query=rxavier/Taurus-7B-1.0
          name: Open LLM Leaderboard
      - task:
          type: text-generation
          name: Text Generation
        dataset:
          name: MMLU (5-Shot)
          type: cais/mmlu
          config: all
          split: test
          args:
            num_few_shot: 5
        metrics:
          - type: acc
            value: 63.5
            name: accuracy
        source:
          url: >-
            https://huggingface.co/spaces/HuggingFaceH4/open_llm_leaderboard?query=rxavier/Taurus-7B-1.0
          name: Open LLM Leaderboard
      - task:
          type: text-generation
          name: Text Generation
        dataset:
          name: TruthfulQA (0-shot)
          type: truthful_qa
          config: multiple_choice
          split: validation
          args:
            num_few_shot: 0
        metrics:
          - type: mc2
            value: 50.21
        source:
          url: >-
            https://huggingface.co/spaces/HuggingFaceH4/open_llm_leaderboard?query=rxavier/Taurus-7B-1.0
          name: Open LLM Leaderboard
      - task:
          type: text-generation
          name: Text Generation
        dataset:
          name: Winogrande (5-shot)
          type: winogrande
          config: winogrande_xl
          split: validation
          args:
            num_few_shot: 5
        metrics:
          - type: acc
            value: 78.14
            name: accuracy
        source:
          url: >-
            https://huggingface.co/spaces/HuggingFaceH4/open_llm_leaderboard?query=rxavier/Taurus-7B-1.0
          name: Open LLM Leaderboard
      - task:
          type: text-generation
          name: Text Generation
        dataset:
          name: GSM8k (5-shot)
          type: gsm8k
          config: main
          split: test
          args:
            num_few_shot: 5
        metrics:
          - type: acc
            value: 59.36
            name: accuracy
        source:
          url: >-
            https://huggingface.co/spaces/HuggingFaceH4/open_llm_leaderboard?query=rxavier/Taurus-7B-1.0
          name: Open LLM Leaderboard

Taurus 7B 1.0

Description

Taurus is an OpenHermes 2.5 finetune using the Economicus dataset, an instruct dataset synthetically generated from Economics PhD textbooks.

The model was trained for 2 epochs (QLoRA) using axolotl. The exact config I used can be found here.

Prompt format

Taurus uses ChatML.

<|im_start|>system
System message
<|im_start|>user
User message<|im_end|>
<|im_start|>assistant

Usage

import torch
from transformers import AutoModelForCausalLM, AutoTokenizer, GeneratorConfig


model_id = "rxavier/Taurus-7B-1.0"
model = AutoModelForCausalLM.from_pretrained(
    model_path,
    torch_dtype=torch.bfloat16,
)
tokenizer = AutoTokenizer.from_pretrained(model_path)
generation_config = GenerationConfig(
                bos_token_id=tok.bos_token_id,
                eos_token_id=tok.eos_token_id,
                pad_token_id=tok.pad_token_id,
            )

prompt = "Give me latex formulas for extended euler equations"
system_message = "You are an expert in economics with PhD level knowledge. You are helpful, give thorough and clear explanations, and use equations and formulas where needed."

messages = [{"role": "system",
             "content": system_message},
            {"role": "user",
             "content": prompt}]
tokens = tokenizer.apply_chat_template(messages, tokenize=True, add_generation_prompt=True, return_tensors="pt").to("cuda")

with torch.no_grad():
    outputs = model.generate(inputs=tokens, generation_config=generation_config)
print(tokenizer.decode(outputs["sequences"].cpu().tolist()[0]))

GGUF quants

You can find GGUF quants for llama.cpp here.

Open LLM Leaderboard Evaluation Results

Detailed results can be found here

Metric	Value
Avg.	66.40
AI2 Reasoning Challenge (25-Shot)	63.57
HellaSwag (10-Shot)	83.64
MMLU (5-Shot)	63.50
TruthfulQA (0-shot)	50.21
Winogrande (5-shot)	78.14
GSM8k (5-shot)	59.36