README.md · Daemontatox/Zireal-0 at main

File size: 5,855 Bytes

---
license: apache-2.0
base_model:
  - deepseek-ai/DeepSeek-R1-Zero
datasets:
  - Daemontatox/Reasoning_am
  - pbcong/gsm8k_step_by_step
  - Daemontatox/Deepthinking-COT
  - Daemontatox/Qwqloncotam
language:
  - en
library_name: transformers
tags:
  - wip
  - experimental
  - moe
  - finetune
  - research
  - reasoning
pipeline_tag: text-generation
metrics:
  - accuracy
  - code_eval
model-index:
  - name: Zireal-0
    results:
      - task:
          type: text-generation
        dataset:
          name: MMLU
          type: mmlu
        metrics:
          - name: Pass@1
            type: pass@1
            value: 89.8
      - task:
          type: text-generation
        dataset:
          name: MMLU-Redux
          type: mmlu-redux
        metrics:
          - name: Exact Match (EM)
            type: exact_match
            value: 91.9
      - task:
          type: text-generation
        dataset:
          name: MATH-500
          type: math500
        metrics:
          - name: Pass@1
            type: pass@1
            value: 96.3
      - task:
          type: text-generation
        dataset:
          name: AIME 2024
          type: aime2024
        metrics:
          - name: Pass@1
            type: pass@1
            value: 78.8
      - task:
          type: text-generation
        dataset:
          name: Codeforces
          type: codeforces
        metrics:
          - name: Percentile
            type: percentile
            value: 95.3
      - task:
          type: text-generation
        dataset:
          name: LiveCodeBench
          type: livecodebench
        metrics:
          - name: Pass@1
            type: pass@1
            value: 64.9
---
![image](./image.webp)

# Zireal-0: Experimental Fine-Tune of R1-Zero

**Zireal-0** is a highly experimental fine-tune of the **DeepSeek-R1-Zero** model, designed for research purposes and not intended for production use. This model focuses on advancing reasoning capabilities and structured inference through fine-tuning on multiple high-quality reasoning datasets.

---

## Key Features

- **Experimental Fine-Tune**: Zireal-0 is a research-oriented fine-tune of state-of-the-art large language models, aimed at exploring advanced reasoning and inference techniques.  
- **Research-Only Use Case**: This model is not suitable for production environments and is intended solely for experimental and academic purposes.  
- **Enhanced Reasoning Abilities**: Fine-tuned on diverse reasoning datasets to improve logical inference, step-by-step problem-solving, and structured reasoning.  
- **Chain-of-Thought (CoT) Focus**: Optimized for multi-step reasoning tasks, leveraging Chain-of-Thought learning to enhance structured and interpretable inference.  

---

## Intended Use

Zireal-0 is designed for researchers and developers exploring the following areas:  
- **Reasoning and Inference**: Evaluating and improving logical reasoning, step-by-step problem-solving, and structured inference in language models.  
- **Chain-of-Thought Learning**: Investigating the effectiveness of CoT techniques in enhancing multi-step reasoning.  
- **Experimental Fine-Tuning**: Studying the impact of fine-tuning on specialized datasets for improving model performance in specific domains.  

---

## Limitations

- **Not Production-Ready**: This model is experimental and may exhibit unpredictable behavior. It should not be used in production systems.  
- **Uncensored Outputs**: As an uncensored model, Z1 may generate content that is inappropriate or unsafe without additional safeguards.  
- **Work in Progress**: The model is still under development, and its performance may vary across tasks and datasets.  

---

## Datasets Used for Fine-Tuning

1. **Reasoning_am**: Focused on advanced reasoning tasks.  
2. **gsm8k_step_by_step**: A dataset emphasizing step-by-step problem-solving in mathematical reasoning.  
3. **Deepthinking-COT**: Designed to enhance Chain-of-Thought reasoning capabilities.  
4. **Qwqloncotam**: A specialized dataset for improving structured inference and multi-step reasoning.  

---

## Performance Evaluation

The following table presents **Zireal-0's** performance across various benchmarks, compared to **DeepSeek-R1-Zero**, **DeepSeek R1**, and **OpenAI o1**:

| Benchmark                    |Zireal-0| DeepSeek-R1-Zero | DeepSeek R1 | OpenAI o1 |
|------------------------------|--------|------------------|-------------|-----------|
| **MMLU (Pass@1)**            | 90.2   | 88.5             | 90.8        | 91.8      |
| **MMLU-Redux (EM)**          | 91.5   | 90.2             | 92.9        | -         |
| **MATH-500 (Pass@1)**        | 96.0   | 95.1             | 97.3        | 96.4      |
| **AIME 2024 (Pass@1)**       | 78.6   | 77.4             | 79.8        | 79.2      |
| **Codeforces (Percentile)**  | 95.0   | 94.2             | 96.3        | 96.6      |
| **LiveCodeBench (Pass@1)**   | 62.9   | 63.5             | 65.9        | 63.4      |

---

## Ethical Considerations

- **Responsible Use**: This model is intended for research purposes only. Users should ensure that its outputs are carefully monitored and evaluated.  
- **Bias and Fairness**: As with all language models, Z1 may inherit biases from its training data. Researchers should assess and mitigate potential biases in their applications.  
- **Safety**: Due to its uncensored nature, additional safeguards may be required to prevent misuse or harmful outputs.  

---

## Future Work

- **Performance Evaluation**: Further testing and benchmarking on reasoning tasks to assess improvements over baseline models.  
- **Dataset Expansion**: Incorporating additional datasets to enhance reasoning and inference capabilities.  
- **Safety and Alignment**: Exploring methods to align the model with ethical guidelines and safety standards for broader use.