neuronovo-9B-v0.4 / README.md
leaderboard-pr-bot's picture
Adding Evaluation Results
a254fb9 verified
|
raw
history blame
5.09 kB
metadata
language:
  - en
license: apache-2.0
library_name: transformers
datasets:
  - Intel/orca_dpo_pairs
  - mlabonne/chatml_dpo_pairs
model-index:
  - name: neuronovo-9B-v0.4
    results:
      - task:
          type: text-generation
          name: Text Generation
        dataset:
          name: AI2 Reasoning Challenge (25-Shot)
          type: ai2_arc
          config: ARC-Challenge
          split: test
          args:
            num_few_shot: 25
        metrics:
          - type: acc_norm
            value: 72.44
            name: normalized accuracy
        source:
          url: >-
            https://huggingface.co/spaces/HuggingFaceH4/open_llm_leaderboard?query=Neuronovo/neuronovo-9B-v0.4
          name: Open LLM Leaderboard
      - task:
          type: text-generation
          name: Text Generation
        dataset:
          name: HellaSwag (10-Shot)
          type: hellaswag
          split: validation
          args:
            num_few_shot: 10
        metrics:
          - type: acc_norm
            value: 88.33
            name: normalized accuracy
        source:
          url: >-
            https://huggingface.co/spaces/HuggingFaceH4/open_llm_leaderboard?query=Neuronovo/neuronovo-9B-v0.4
          name: Open LLM Leaderboard
      - task:
          type: text-generation
          name: Text Generation
        dataset:
          name: MMLU (5-Shot)
          type: cais/mmlu
          config: all
          split: test
          args:
            num_few_shot: 5
        metrics:
          - type: acc
            value: 65.24
            name: accuracy
        source:
          url: >-
            https://huggingface.co/spaces/HuggingFaceH4/open_llm_leaderboard?query=Neuronovo/neuronovo-9B-v0.4
          name: Open LLM Leaderboard
      - task:
          type: text-generation
          name: Text Generation
        dataset:
          name: TruthfulQA (0-shot)
          type: truthful_qa
          config: multiple_choice
          split: validation
          args:
            num_few_shot: 0
        metrics:
          - type: mc2
            value: 71.07
        source:
          url: >-
            https://huggingface.co/spaces/HuggingFaceH4/open_llm_leaderboard?query=Neuronovo/neuronovo-9B-v0.4
          name: Open LLM Leaderboard
      - task:
          type: text-generation
          name: Text Generation
        dataset:
          name: Winogrande (5-shot)
          type: winogrande
          config: winogrande_xl
          split: validation
          args:
            num_few_shot: 5
        metrics:
          - type: acc
            value: 80.66
            name: accuracy
        source:
          url: >-
            https://huggingface.co/spaces/HuggingFaceH4/open_llm_leaderboard?query=Neuronovo/neuronovo-9B-v0.4
          name: Open LLM Leaderboard
      - task:
          type: text-generation
          name: Text Generation
        dataset:
          name: GSM8k (5-shot)
          type: gsm8k
          config: main
          split: test
          args:
            num_few_shot: 5
        metrics:
          - type: acc
            value: 62.77
            name: accuracy
        source:
          url: >-
            https://huggingface.co/spaces/HuggingFaceH4/open_llm_leaderboard?query=Neuronovo/neuronovo-9B-v0.4
          name: Open LLM Leaderboard

More information about previous Neuronovo/neuronovo-9B-v0.2 version available here: ๐Ÿ”—Don't stop DPOptimizing!

Author: Jan Kocoล„     ๐Ÿ”—LinkedIn     ๐Ÿ”—Google Scholar     ๐Ÿ”—ResearchGate

Changes concerning Neuronovo/neuronovo-9B-v0.2:

  1. Training Dataset: In addition to the Intel/orca_dpo_pairs dataset, this version incorporates a mlabonne/chatml_dpo_pairs. The combined datasets enhance the model's capabilities in dialogues and interactive scenarios, further specializing it in natural language understanding and response generation.

  2. Tokenizer and Formatting: The tokenizer now originates directly from the Neuronovo/neuronovo-9B-v0.2 model.

  3. Training Configuration: The training approach has shifted from using max_steps=200 to num_train_epochs=1. This represents a change in the training strategy, focusing on epoch-based training rather than a fixed number of steps.

  4. Learning Rate: The learning rate has been reduced to a smaller value of 5e-8. This finer learning rate allows for more precise adjustments during the training process, potentially leading to better model performance.

Open LLM Leaderboard Evaluation Results

Detailed results can be found here

Metric Value
Avg. 73.42
AI2 Reasoning Challenge (25-Shot) 72.44
HellaSwag (10-Shot) 88.33
MMLU (5-Shot) 65.24
TruthfulQA (0-shot) 71.07
Winogrande (5-shot) 80.66
GSM8k (5-shot) 62.77