StarDust-12b-v1 / README.md
leaderboard-pr-bot's picture
Adding Evaluation Results
3f46dfa verified
|
raw
history blame
5.69 kB
metadata
license: apache-2.0
tags:
  - chat
base_model:
  - Gryphe/Pantheon-RP-1.6-12b-Nemo
  - Sao10K/MN-12B-Lyra-v3
  - anthracite-org/magnum-v2.5-12b-kto
  - nbeerbower/mistral-nemo-bophades-12B
pipeline_tag: text-generation
model-index:
  - name: StarDust-12b-v1
    results:
      - task:
          type: text-generation
          name: Text Generation
        dataset:
          name: IFEval (0-Shot)
          type: HuggingFaceH4/ifeval
          args:
            num_few_shot: 0
        metrics:
          - type: inst_level_strict_acc and prompt_level_strict_acc
            value: 54.59
            name: strict accuracy
        source:
          url: >-
            https://huggingface.co/spaces/open-llm-leaderboard/open_llm_leaderboard?query=Luni/StarDust-12b-v1
          name: Open LLM Leaderboard
      - task:
          type: text-generation
          name: Text Generation
        dataset:
          name: BBH (3-Shot)
          type: BBH
          args:
            num_few_shot: 3
        metrics:
          - type: acc_norm
            value: 34.45
            name: normalized accuracy
        source:
          url: >-
            https://huggingface.co/spaces/open-llm-leaderboard/open_llm_leaderboard?query=Luni/StarDust-12b-v1
          name: Open LLM Leaderboard
      - task:
          type: text-generation
          name: Text Generation
        dataset:
          name: MATH Lvl 5 (4-Shot)
          type: hendrycks/competition_math
          args:
            num_few_shot: 4
        metrics:
          - type: exact_match
            value: 5.97
            name: exact match
        source:
          url: >-
            https://huggingface.co/spaces/open-llm-leaderboard/open_llm_leaderboard?query=Luni/StarDust-12b-v1
          name: Open LLM Leaderboard
      - task:
          type: text-generation
          name: Text Generation
        dataset:
          name: GPQA (0-shot)
          type: Idavidrein/gpqa
          args:
            num_few_shot: 0
        metrics:
          - type: acc_norm
            value: 3.47
            name: acc_norm
        source:
          url: >-
            https://huggingface.co/spaces/open-llm-leaderboard/open_llm_leaderboard?query=Luni/StarDust-12b-v1
          name: Open LLM Leaderboard
      - task:
          type: text-generation
          name: Text Generation
        dataset:
          name: MuSR (0-shot)
          type: TAUR-Lab/MuSR
          args:
            num_few_shot: 0
        metrics:
          - type: acc_norm
            value: 13.76
            name: acc_norm
        source:
          url: >-
            https://huggingface.co/spaces/open-llm-leaderboard/open_llm_leaderboard?query=Luni/StarDust-12b-v1
          name: Open LLM Leaderboard
      - task:
          type: text-generation
          name: Text Generation
        dataset:
          name: MMLU-PRO (5-shot)
          type: TIGER-Lab/MMLU-Pro
          config: main
          split: test
          args:
            num_few_shot: 5
        metrics:
          - type: acc
            value: 26.8
            name: accuracy
        source:
          url: >-
            https://huggingface.co/spaces/open-llm-leaderboard/open_llm_leaderboard?query=Luni/StarDust-12b-v1
          name: Open LLM Leaderboard

image/png

StarDust-12b-v1

Quants

Description | Usecase

The result of this merge is in my opinion a more vibrant and less generic sonnet inspired prose, it's able to be gentle and harsh where asked. I've personally been trying to get a more spice while also compensating for the Magnum-v2.5 having the issue on my end that it simply won't stop yapping.

  • This model is intended to be used as a Role-playing model.
  • Its direct conversational output is... I can't even say it's luck, it's just not made for it.
  • Extension to Conversational output: The Model is designed for roleplay, direct instructing or general purpose is NOT recommended.

Initial Feedback

Initial feedback shows that the model has a tendency to promote flirting. If this becomes too much try to steer the model with a system prompt to focus on SFW and on-flirty interactions.

Prompting

Edit: ChatML has proven to be the BEST choice.

Both Mistral and ChatML should work though I had better results with ChatML: ChatML Example:

"""<|im_start|>user
Hi there!<|im_end|>
<|im_start|>assistant
Nice to meet you!<|im_end|>
<|im_start|>user
Can I ask a question?<|im_end|>
<|im_start|>assistant
"""

Merge Details

Merge Method

This model was merged using the DARE TIES merge method using Sao10K/MN-12B-Lyra-v3 as a base.

Models Merged

The following models were included in the merge:

Special Thanks

Special thanks to the SillyTilly and myself for helping me find the energy to finish this.

Open LLM Leaderboard Evaluation Results

Detailed results can be found here

Metric Value
Avg. 23.17
IFEval (0-Shot) 54.59
BBH (3-Shot) 34.45
MATH Lvl 5 (4-Shot) 5.97
GPQA (0-shot) 3.47
MuSR (0-shot) 13.76
MMLU-PRO (5-shot) 26.80