Pinkstack
/

SuperThoughts-CoT-14B-16k-o1-QwQ-GGUF

@@ -17,12 +17,14 @@ language:
 - en
 pipeline_tag: text-generation
 ---
- 🧀 Which quant is right for you? (all tested!)
-- ***Q3:*** This quant should be used on most high end modern devices like rtx 3080, Responses are very high quality, but its slightly slower than Q4. (Runs at ~1 tokens per second or less on a Samsung z fold 5 smartphone.)
 - ***Q4:*** This quant should be used on high-end modern devices like rtx 3080's or any GPU,TPU,CPU that is powerful enough and has at minimum 15gb of available memory, (On servers and high-end computers we personally use it.) reccomened.
 - ***Q8:*** This quant should be used on very high-end modern devices which can handle it's power, it is very powerful but q4 is more well rounded, not reccomened.
-# Information
 - ⚠️ A low temperature must be used to ensure it won't fail at reasoning. we use 0.3 - 0.8!
 - ⚠️ Due to the current prompt format, it may sometimes put <|FinalAnswer|> at the end, you can ignore this or modify the prompt format.
 - this is out flagship model, with top-tier reasoning, rivaling gemini-flash-exp-2.0-thinking and o1 mini. results are overall similar to both of them, we are not comparing to qwq as it has much longer results which waste tokens.
@@ -38,7 +40,7 @@ the model uses this prompt format: (modified phi-4 prompt)
 {{ .Response }}<|FinalAnswer|><|im_end|>
 ```
-# Examples:
 (q4_k_m, 10GB rtx 3080, 64GB memory, running inside of MSTY, all use "You are a friendly ai assistant." as the System prompt.)
 **example 1:**
 ![example1part1.png](https://cdn-uploads.huggingface.co/production/uploads/6710ba6af1279fe0dfe33afe/Dcd6-wbpDQuXoulHaqATo.png)
@@ -47,7 +49,7 @@ the model uses this prompt format: (modified phi-4 prompt)
 **example 2:**
 ![example2](https://cdn-uploads.huggingface.co/production/uploads/6710ba6af1279fe0dfe33afe/c4h-nw0DPTrQgX-_tvBoT.png)
-# Uploaded  model
 - **Developed by:** Pinkstack
 - **License:** MIT

 - en
 pipeline_tag: text-generation
 ---
+Unlike previous models we've uploaded, this one is the best one we've published! Answers in two steps: Reasoning -> Final answer like chatgpt o1 mini, gemini flash 2.0 thinking experimental. This model is our new flagship.
+# 🧀 Which quant is right for you? (all tested!)
+- ***Q3:*** This quant should be used on most high-end devices like rtx 2080TI's, Responses are very high quality, but its slightly slower than Q4. (Runs at ~1 tokens per second or less on a Samsung z fold 5 smartphone.)
 - ***Q4:*** This quant should be used on high-end modern devices like rtx 3080's or any GPU,TPU,CPU that is powerful enough and has at minimum 15gb of available memory, (On servers and high-end computers we personally use it.) reccomened.
 - ***Q8:*** This quant should be used on very high-end modern devices which can handle it's power, it is very powerful but q4 is more well rounded, not reccomened.
+# 🧀 Information
 - ⚠️ A low temperature must be used to ensure it won't fail at reasoning. we use 0.3 - 0.8!
 - ⚠️ Due to the current prompt format, it may sometimes put <|FinalAnswer|> at the end, you can ignore this or modify the prompt format.
 - this is out flagship model, with top-tier reasoning, rivaling gemini-flash-exp-2.0-thinking and o1 mini. results are overall similar to both of them, we are not comparing to qwq as it has much longer results which waste tokens.
 {{ .Response }}<|FinalAnswer|><|im_end|>
 ```
+# 🧀 Examples:
 (q4_k_m, 10GB rtx 3080, 64GB memory, running inside of MSTY, all use "You are a friendly ai assistant." as the System prompt.)
 **example 1:**
 ![example1part1.png](https://cdn-uploads.huggingface.co/production/uploads/6710ba6af1279fe0dfe33afe/Dcd6-wbpDQuXoulHaqATo.png)
 **example 2:**
 ![example2](https://cdn-uploads.huggingface.co/production/uploads/6710ba6af1279fe0dfe33afe/c4h-nw0DPTrQgX-_tvBoT.png)
+# 🧀 Uploaded  model
 - **Developed by:** Pinkstack
 - **License:** MIT