Update README.md
Browse files
README.md
CHANGED
@@ -17,12 +17,14 @@ language:
|
|
17 |
- en
|
18 |
pipeline_tag: text-generation
|
19 |
---
|
20 |
-
|
21 |
-
|
|
|
|
|
22 |
- ***Q4:*** This quant should be used on high-end modern devices like rtx 3080's or any GPU,TPU,CPU that is powerful enough and has at minimum 15gb of available memory, (On servers and high-end computers we personally use it.) reccomened.
|
23 |
- ***Q8:*** This quant should be used on very high-end modern devices which can handle it's power, it is very powerful but q4 is more well rounded, not reccomened.
|
24 |
|
25 |
-
# Information
|
26 |
- ⚠️ A low temperature must be used to ensure it won't fail at reasoning. we use 0.3 - 0.8!
|
27 |
- ⚠️ Due to the current prompt format, it may sometimes put <|FinalAnswer|> at the end, you can ignore this or modify the prompt format.
|
28 |
- this is out flagship model, with top-tier reasoning, rivaling gemini-flash-exp-2.0-thinking and o1 mini. results are overall similar to both of them, we are not comparing to qwq as it has much longer results which waste tokens.
|
@@ -38,7 +40,7 @@ the model uses this prompt format: (modified phi-4 prompt)
|
|
38 |
{{ .Response }}<|FinalAnswer|><|im_end|>
|
39 |
```
|
40 |
|
41 |
-
# Examples:
|
42 |
(q4_k_m, 10GB rtx 3080, 64GB memory, running inside of MSTY, all use "You are a friendly ai assistant." as the System prompt.)
|
43 |
**example 1:**
|
44 |
![example1part1.png](https://cdn-uploads.huggingface.co/production/uploads/6710ba6af1279fe0dfe33afe/Dcd6-wbpDQuXoulHaqATo.png)
|
@@ -47,7 +49,7 @@ the model uses this prompt format: (modified phi-4 prompt)
|
|
47 |
**example 2:**
|
48 |
![example2](https://cdn-uploads.huggingface.co/production/uploads/6710ba6af1279fe0dfe33afe/c4h-nw0DPTrQgX-_tvBoT.png)
|
49 |
|
50 |
-
# Uploaded model
|
51 |
|
52 |
- **Developed by:** Pinkstack
|
53 |
- **License:** MIT
|
|
|
17 |
- en
|
18 |
pipeline_tag: text-generation
|
19 |
---
|
20 |
+
|
21 |
+
Unlike previous models we've uploaded, this one is the best one we've published! Answers in two steps: Reasoning -> Final answer like chatgpt o1 mini, gemini flash 2.0 thinking experimental. This model is our new flagship.
|
22 |
+
# 🧀 Which quant is right for you? (all tested!)
|
23 |
+
- ***Q3:*** This quant should be used on most high-end devices like rtx 2080TI's, Responses are very high quality, but its slightly slower than Q4. (Runs at ~1 tokens per second or less on a Samsung z fold 5 smartphone.)
|
24 |
- ***Q4:*** This quant should be used on high-end modern devices like rtx 3080's or any GPU,TPU,CPU that is powerful enough and has at minimum 15gb of available memory, (On servers and high-end computers we personally use it.) reccomened.
|
25 |
- ***Q8:*** This quant should be used on very high-end modern devices which can handle it's power, it is very powerful but q4 is more well rounded, not reccomened.
|
26 |
|
27 |
+
# 🧀 Information
|
28 |
- ⚠️ A low temperature must be used to ensure it won't fail at reasoning. we use 0.3 - 0.8!
|
29 |
- ⚠️ Due to the current prompt format, it may sometimes put <|FinalAnswer|> at the end, you can ignore this or modify the prompt format.
|
30 |
- this is out flagship model, with top-tier reasoning, rivaling gemini-flash-exp-2.0-thinking and o1 mini. results are overall similar to both of them, we are not comparing to qwq as it has much longer results which waste tokens.
|
|
|
40 |
{{ .Response }}<|FinalAnswer|><|im_end|>
|
41 |
```
|
42 |
|
43 |
+
# 🧀 Examples:
|
44 |
(q4_k_m, 10GB rtx 3080, 64GB memory, running inside of MSTY, all use "You are a friendly ai assistant." as the System prompt.)
|
45 |
**example 1:**
|
46 |
![example1part1.png](https://cdn-uploads.huggingface.co/production/uploads/6710ba6af1279fe0dfe33afe/Dcd6-wbpDQuXoulHaqATo.png)
|
|
|
49 |
**example 2:**
|
50 |
![example2](https://cdn-uploads.huggingface.co/production/uploads/6710ba6af1279fe0dfe33afe/c4h-nw0DPTrQgX-_tvBoT.png)
|
51 |
|
52 |
+
# 🧀 Uploaded model
|
53 |
|
54 |
- **Developed by:** Pinkstack
|
55 |
- **License:** MIT
|