File size: 11,105 Bytes
fb3b259 68cb9d9 fb3b259 68cb9d9 781cd98 68cb9d9 fb3b259 a826306 66a214d 52a7541 b050ea8 52a7541 e774c56 52a7541 be1a7db 52a7541 fb3b259 68cb9d9 |
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 207 208 209 210 211 212 213 214 215 216 217 218 219 220 221 222 223 224 225 226 227 228 229 230 231 232 233 234 235 236 237 238 239 240 241 242 243 244 245 246 247 248 249 250 251 252 253 254 255 256 257 258 259 260 261 262 263 264 265 266 267 268 269 270 271 272 273 274 275 276 277 278 279 280 281 282 283 284 285 286 287 288 289 |
---
language:
- en
license: llama3.2
tags:
- text-generation-inference
- transformers
- unsloth
- llama
- trl
base_model: unsloth/Llama-3.2-3B-Instruct-unsloth-bnb-4bit
datasets:
- openai/gsm8k
model-index:
- name: ReasoningCore-3B-0
results:
- task:
type: text-generation
name: Text Generation
dataset:
name: IFEval (0-Shot)
type: HuggingFaceH4/ifeval
args:
num_few_shot: 0
metrics:
- type: inst_level_strict_acc and prompt_level_strict_acc
value: 73.41
name: strict accuracy
source:
url: https://huggingface.co/spaces/open-llm-leaderboard/open_llm_leaderboard?query=EpistemeAI/ReasoningCore-3B-0
name: Open LLM Leaderboard
- task:
type: text-generation
name: Text Generation
dataset:
name: BBH (3-Shot)
type: BBH
args:
num_few_shot: 3
metrics:
- type: acc_norm
value: 22.17
name: normalized accuracy
source:
url: https://huggingface.co/spaces/open-llm-leaderboard/open_llm_leaderboard?query=EpistemeAI/ReasoningCore-3B-0
name: Open LLM Leaderboard
- task:
type: text-generation
name: Text Generation
dataset:
name: MATH Lvl 5 (4-Shot)
type: hendrycks/competition_math
args:
num_few_shot: 4
metrics:
- type: exact_match
value: 15.86
name: exact match
source:
url: https://huggingface.co/spaces/open-llm-leaderboard/open_llm_leaderboard?query=EpistemeAI/ReasoningCore-3B-0
name: Open LLM Leaderboard
- task:
type: text-generation
name: Text Generation
dataset:
name: GPQA (0-shot)
type: Idavidrein/gpqa
args:
num_few_shot: 0
metrics:
- type: acc_norm
value: 3.02
name: acc_norm
source:
url: https://huggingface.co/spaces/open-llm-leaderboard/open_llm_leaderboard?query=EpistemeAI/ReasoningCore-3B-0
name: Open LLM Leaderboard
- task:
type: text-generation
name: Text Generation
dataset:
name: MuSR (0-shot)
type: TAUR-Lab/MuSR
args:
num_few_shot: 0
metrics:
- type: acc_norm
value: 2.56
name: acc_norm
source:
url: https://huggingface.co/spaces/open-llm-leaderboard/open_llm_leaderboard?query=EpistemeAI/ReasoningCore-3B-0
name: Open LLM Leaderboard
- task:
type: text-generation
name: Text Generation
dataset:
name: MMLU-PRO (5-shot)
type: TIGER-Lab/MMLU-Pro
config: main
split: test
args:
num_few_shot: 5
metrics:
- type: acc
value: 24.14
name: accuracy
source:
url: https://huggingface.co/spaces/open-llm-leaderboard/open_llm_leaderboard?query=EpistemeAI/ReasoningCore-3B-0
name: Open LLM Leaderboard
---
A better version is available: [ReasoningCore-3B-RE1-V2](https://huggingface.co/EpistemeAI/ReasoningCore-3B-RE1-V2)
Note: This is an experimental model.
# ReasomingCore‑3B
**ReasomingCore‑3B** is a multilingual, reasoning‑enhanced large language model developed by EpitemeAI. Pretrained on vast amounts of publicly available data and instruction‑tuned to excel at nuanced reasoning, dialogue management, retrieval, and summarization tasks, it often outperforms many current open source and proprietary conversational models on a range of industry benchmarks.
---
## Model Information
- **Model Developer:** EpitemeAI
- **Model Architecture:**
ReasomingCore‑3B is an auto‑regressive language model built on an optimized transformer architecture. It incorporates specialized reasoning pathways and has been fine‑tuned using Group Robust Preference Optimization(GRPO) , and both supervised learning and reinforcement learning with human feedback (RLHF) to align with human expectations for clarity, accuracy, and safety in complex tasks.
| | Training Data | Params | Input Modalities | Output Modalities | Context Length | GQA | Shared Embeddings | Token Count | Knowledge Cutoff |
|--------------------------------|--------------------------------------------------|--------|-----------------------|------------------------------|----------------|-----|-------------------|----------------|-------------------|
| **ReasomingCore‑3B (text only)** | A new mix of publicly available online data. | 3B | Multilingual Text | Multilingual Text and code | 128k | Yes | Yes | Up to 9T tokens | December 2023 |
- **Supported Languages:**
Officially supports English, German, French, Italian, Portuguese, Hindi, Spanish, and Thai. While the pretraining included a broader range of languages, additional languages can be fine‑tuned in compliance with the community license and acceptable use policies.
- **Model Release Date:** Sept 25, 2024
- **Status:** Static model trained on an offline dataset. Future iterations may further enhance its reasoning capabilities and safety features.
- **License:** Use is governed by the [Llama 3.2 Community License](https://github.com/meta-llama/llama-models/blob/main/models/llama3_2/LICENSE) (a custom, commercial license agreement).
- **Feedback:** For questions or comments, please refer to the [GitHub repository README](https://github.com/meta-llama/llama-models/tree/main/models/llama3_2) or follow the linked instructions.
---
## Intended Use
### Use Cases
- **Conversational AI:** Assistant‑like interactions.
- **Knowledge Retrieval & Summarization:** Dynamic extraction and condensation of information.
- **Mobile AI‑Powered Writing Assistants:** Query reformulation and natural language generation.
- **General Natural Language Generation:** Any application that benefits from advanced reasoning abilities.
### Out of Scope
- Deployments that violate applicable laws or trade compliance regulations.
- Use cases that conflict with the Acceptable Use Policy or licensing terms.
- Deployments in languages not explicitly supported (unless additional safety and performance validations are performed).
---
## How to Use
ReasomingCore‑3B can be integrated using popular machine learning frameworks. Two primary methods are provided:
### Use with Transformers
Ensure you have transformers version 4.43.0 or later installed:
provided:
## Use system prompt
```bash
SYSTEM_PROMPT = """
Respond in the following format:
<reasoning>
...
</reasoning>
<answer>
...
</answer>
"""
```
```bash
pip install --upgrade transformers
import torch
from transformers import pipeline
model_id = "EpistemeAI/ReasoningCore-3B-0"
pipe = pipeline(
"text-generation",
model=model_id,
torch_dtype=torch.bfloat16,
device_map="auto"
)
print(pipe("The secret to effective reasoning is"))
```
## For Mathematical problems
Please use "Please reason step by step, and put your final answer within \boxed{}" in system prompt
## Responsibility & Safety
### Responsible Deployment
#### Approach:
- **ReasomingCore‑3B** is a foundational technology that includes built‑in safety guardrails. Developers are encouraged to integrate additional safeguards tailored to their specific applications.
#### System‑Level Safety:
- The model is designed to be deployed as part of a broader system that implements safety measures (e.g., Prompt Guard, Code Shield) to ensure outputs remain safe even under adversarial conditions.
---
### Safety Fine‑Tuning & Data Strategy
#### Objectives:
- Provide a reliable tool for building secure and helpful reasoning systems.
- Mitigate adversarial misuse through advanced data selection and response optimization techniques.
#### Methodology:
- Incorporate adversarial prompts during training to refine model refusals and response tone.
- Combine human‑curated data with synthetic data.
- Utilize iterative fine‑tuning using supervised learning, rejection sampling, and preference optimization.
---
### Evaluations and Red Teaming
#### Scaled Evaluations:
- Dedicated adversarial datasets were used to rigorously test the model’s robustness. Developers should perform context‑specific evaluations.
#### Red Teaming:
- Experts in cybersecurity, adversarial machine learning, and responsible AI conducted recurring red team exercises to identify vulnerabilities and improve both performance and safety.
---
### Critical Risk Mitigations
- **CBRNE:**
The model has been evaluated to ensure it does not enhance capabilities for harmful activities involving chemical, biological, radiological, nuclear, or explosive materials.
- **Child Safety:**
Expert assessments were conducted to evaluate and mitigate potential child safety risks.
- **Cyber Attacks:**
Measures were taken to ensure the model cannot autonomously facilitate cyber‑offensive operations.
---
### Ethical Considerations and Limitations
#### Core Values:
- **ReasomingCore‑3B** is built on the values of openness, inclusivity, and helpfulness. It is designed to respect user autonomy and foster free thought and expression while mitigating potential harm.
#### Testing and Limitations:
- Despite extensive testing across diverse scenarios, the model may occasionally produce inaccurate, biased, or objectionable outputs. Developers must perform additional safety testing and integrate further safeguards as needed.
#### Resources for Safe Deployment, with Meta Safety Deployment:
- [Responsible Use Guide](https://llama.meta.com/responsible-use-guide)
- [Trust and Safety Resources](https://llama.meta.com/trust-and-safety)
- [Getting Started Guide](https://llama.meta.com/docs/get-started)
---
### Conclusion
**ReasomingCore‑3B** represents a significant advancement in multilingual, reasoning‑enhanced language models. Optimized for tasks requiring deep reasoning, contextual understanding, and safe, helpful interactions, it offers a powerful tool for both commercial and research applications. We invite developers and researchers to explore its capabilities and contribute to building secure, innovative AI systems.
For further details, questions, or feedback, please email [email protected]
# Uploaded model
- **Developed by:** EpistemeAI
- **License:** apache-2.0
- **Finetuned from model :** unsloth/Llama-3.2-3B-Instruct-unsloth-bnb-4bit
This llama model was trained 2x faster with [Unsloth](https://github.com/unslothai/unsloth) and Huggingface's TRL library.
[<img src="https://raw.githubusercontent.com/unslothai/unsloth/main/images/unsloth%20made%20with%20love.png" width="200"/>](https://github.com/unslothai/unsloth)
# [Open LLM Leaderboard Evaluation Results](https://huggingface.co/spaces/open-llm-leaderboard/open_llm_leaderboard)
Detailed results can be found [here](https://huggingface.co/datasets/open-llm-leaderboard/EpistemeAI__ReasoningCore-3B-0-details)
| Metric |Value|
|-------------------|----:|
|Avg. |23.53|
|IFEval (0-Shot) |73.41|
|BBH (3-Shot) |22.17|
|MATH Lvl 5 (4-Shot)|15.86|
|GPQA (0-shot) | 3.02|
|MuSR (0-shot) | 2.56|
|MMLU-PRO (5-shot) |24.14|
|