File size: 5,855 Bytes
8b2b0ec b9dadb9 d8c4020 b9dadb9 d8c4020 b9dadb9 d8c4020 8b2b0ec b9dadb9 d8c4020 5e0be0f f45c94a d8c4020 576e738 d8c4020 d7bdea7 576e738 d7bdea7 576e738 d7bdea7 af8982b d7bdea7 576e738 d7bdea7 576e738 75603fd 576e738 d7bdea7 |
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 |
---
license: apache-2.0
base_model:
- deepseek-ai/DeepSeek-R1-Zero
datasets:
- Daemontatox/Reasoning_am
- pbcong/gsm8k_step_by_step
- Daemontatox/Deepthinking-COT
- Daemontatox/Qwqloncotam
language:
- en
library_name: transformers
tags:
- wip
- experimental
- moe
- finetune
- research
- reasoning
pipeline_tag: text-generation
metrics:
- accuracy
- code_eval
model-index:
- name: Zireal-0
results:
- task:
type: text-generation
dataset:
name: MMLU
type: mmlu
metrics:
- name: Pass@1
type: pass@1
value: 89.8
- task:
type: text-generation
dataset:
name: MMLU-Redux
type: mmlu-redux
metrics:
- name: Exact Match (EM)
type: exact_match
value: 91.9
- task:
type: text-generation
dataset:
name: MATH-500
type: math500
metrics:
- name: Pass@1
type: pass@1
value: 96.3
- task:
type: text-generation
dataset:
name: AIME 2024
type: aime2024
metrics:
- name: Pass@1
type: pass@1
value: 78.8
- task:
type: text-generation
dataset:
name: Codeforces
type: codeforces
metrics:
- name: Percentile
type: percentile
value: 95.3
- task:
type: text-generation
dataset:
name: LiveCodeBench
type: livecodebench
metrics:
- name: Pass@1
type: pass@1
value: 64.9
---

# Zireal-0: Experimental Fine-Tune of R1-Zero
**Zireal-0** is a highly experimental fine-tune of the **DeepSeek-R1-Zero** model, designed for research purposes and not intended for production use. This model focuses on advancing reasoning capabilities and structured inference through fine-tuning on multiple high-quality reasoning datasets.
---
## Key Features
- **Experimental Fine-Tune**: Zireal-0 is a research-oriented fine-tune of state-of-the-art large language models, aimed at exploring advanced reasoning and inference techniques.
- **Research-Only Use Case**: This model is not suitable for production environments and is intended solely for experimental and academic purposes.
- **Enhanced Reasoning Abilities**: Fine-tuned on diverse reasoning datasets to improve logical inference, step-by-step problem-solving, and structured reasoning.
- **Chain-of-Thought (CoT) Focus**: Optimized for multi-step reasoning tasks, leveraging Chain-of-Thought learning to enhance structured and interpretable inference.
---
## Intended Use
Zireal-0 is designed for researchers and developers exploring the following areas:
- **Reasoning and Inference**: Evaluating and improving logical reasoning, step-by-step problem-solving, and structured inference in language models.
- **Chain-of-Thought Learning**: Investigating the effectiveness of CoT techniques in enhancing multi-step reasoning.
- **Experimental Fine-Tuning**: Studying the impact of fine-tuning on specialized datasets for improving model performance in specific domains.
---
## Limitations
- **Not Production-Ready**: This model is experimental and may exhibit unpredictable behavior. It should not be used in production systems.
- **Uncensored Outputs**: As an uncensored model, Z1 may generate content that is inappropriate or unsafe without additional safeguards.
- **Work in Progress**: The model is still under development, and its performance may vary across tasks and datasets.
---
## Datasets Used for Fine-Tuning
1. **Reasoning_am**: Focused on advanced reasoning tasks.
2. **gsm8k_step_by_step**: A dataset emphasizing step-by-step problem-solving in mathematical reasoning.
3. **Deepthinking-COT**: Designed to enhance Chain-of-Thought reasoning capabilities.
4. **Qwqloncotam**: A specialized dataset for improving structured inference and multi-step reasoning.
---
## Performance Evaluation
The following table presents **Zireal-0's** performance across various benchmarks, compared to **DeepSeek-R1-Zero**, **DeepSeek R1**, and **OpenAI o1**:
| Benchmark |Zireal-0| DeepSeek-R1-Zero | DeepSeek R1 | OpenAI o1 |
|------------------------------|--------|------------------|-------------|-----------|
| **MMLU (Pass@1)** | 90.2 | 88.5 | 90.8 | 91.8 |
| **MMLU-Redux (EM)** | 91.5 | 90.2 | 92.9 | - |
| **MATH-500 (Pass@1)** | 96.0 | 95.1 | 97.3 | 96.4 |
| **AIME 2024 (Pass@1)** | 78.6 | 77.4 | 79.8 | 79.2 |
| **Codeforces (Percentile)** | 95.0 | 94.2 | 96.3 | 96.6 |
| **LiveCodeBench (Pass@1)** | 62.9 | 63.5 | 65.9 | 63.4 |
---
## Ethical Considerations
- **Responsible Use**: This model is intended for research purposes only. Users should ensure that its outputs are carefully monitored and evaluated.
- **Bias and Fairness**: As with all language models, Z1 may inherit biases from its training data. Researchers should assess and mitigate potential biases in their applications.
- **Safety**: Due to its uncensored nature, additional safeguards may be required to prevent misuse or harmful outputs.
---
## Future Work
- **Performance Evaluation**: Further testing and benchmarking on reasoning tasks to assess improvements over baseline models.
- **Dataset Expansion**: Incorporating additional datasets to enhance reasoning and inference capabilities.
- **Safety and Alignment**: Exploring methods to align the model with ethical guidelines and safety standards for broader use. |