impressive efficiency

#1
by Davex83 - opened

seriously speaking, am I the only one who noticed that this model is one of the best little ones?
in my opinion it is a model with impressive efficiency, congratulations I hope you want to continue developing it.

Thank you I built it for a customer with specific needs. usually I post the most efficient ones with complete Open Source instructions in the Intelligent Estate community. you can find some others that really push the limits there. How are you using it if you don't mind me asking? platform/use case rag tool use. honestly I forgot the specifics about why I made this guy I know it was for RAG and GPT4ALL in an estate agent.
for efficiency I've found Spaetzle, blacsheep and I'm about to post a great few great new Quants.

Hello, it's a pleasure . I’m just a tech enthusiast with a mini PC powered by an Intel GPU Arc and 32GB of shared ddr5 ram. I used your model with Ollama and an OpenWebUI frontend.

I tested several models with questions on logic, mathematics, physics, text comprehension, and creativity, and your model surprisingly outperformed others with twice the instruction parameters. Moreover, it is very fast, which pleasantly surprised me.

Thank you for your recommendations; I'll try to explore more of your work! :-)

I started building and optimizing for use on a Mini-PC specifically the sub 200$ N300 units so I could get a few out to people wanting local AI.

I moved to using them over Rasperry Pis some time ago, besides having to sanitize windows it really is great and we work with unique open/closed source datasets for importance matrix quantization and tests to verify efficiency. I'm glad someone out there is finding them useful. it's good to hear and I'm sure you'll find at least some of the Intelligent Estate models handy.

Feel free to join it's open to all (you can get access top some private models and other projects. basically it's the template for anyone to use for setting up AI RAG agents for clients) so if you wanted to start a business installing RAG systems for other business you just have to use GPT4ALL and a Mini PC and you can make bank. I'm trying to create a network for "Founders" but that's a long story.

I bought a mini PC with an Intel ARC GPU, installed Ubuntu 22, and set up the ipex-llm drivers. It has 32GB of shared DDR5 RAM, and I can run medium-sized models decently. I joined the group you recommended and tested their Dolphin model—it's incredible. If you have any suggestions for a small yet powerful model like yours, feel free to share. I’d really appreciate it. Thanks, and great work!

Oh if you want something that is small insainely fast and insainly good I would go with any of the models from FBLGIT he's a member of the group and we have a quant of his that is probably the most impressive for it's size MiniClaus -- https://huggingface.co/IntelligentEstate/fblgit_miniclaus-qw1.5B-UNAMGS-Q8_0-GGUF

Im pretty sure this model is a merge of one of his and someone else's but I can't give the guy enough credit he is truly an expert in the field.
Her's another of his https://huggingface.co/IntelligentEstate/Pancho-V1va-Replicant-qw25-Q8_0-GGUF\
or https://huggingface.co/IntelligentEstate/Israfel_Qwen2.6-iQ4_K_M-GGUF It's pretty impressive too as well as the THOTH-Hermes based model https://huggingface.co/IntelligentEstate/Thoth_Warding-Llama-3B-IQ5_K_S-GGUF

and if those aren't powerful enough then try the PHI-4 base I personally haven't really had a chance to use it much but I hear good things.
https://huggingface.co/IntelligentEstate/The_Hooch-phi-4-R1-Q4_K_M-GGUF

But some of them have the pdf for an AGI method we are working on. it gives emergent properties to highly functional models so instead of having to fine-tune a specific RP model it retains all of it's abilities and acts pretty wild sometimes depending on how you use it it can double your models reasoning or start it's own religion. I had it hooked up to my GMRS base station and I came back and some poor old fella with a radio was ready to marry it.. I'm sure I'll be getting a letter from the FCC any day now. but the QwenStar models in gpt4all have decent tool use, for the most part you can ignore the Jinja code unless the default isn't working or you want to employ the JavaScript interpreter for reasoning only GPT4ALL is great for testing models out/ it's easy to switch out templates, instructs and clone your model so you don't have to start all over. Just a workflow I picked up. But the group has a collection called "Sota-gguf" you'll find a ton of good stuff there too. Welcome

Hi! Thank you, you are very kind, I will try them all with pleasure, I am really curious, I will make some very simple videos of how they work on my mini PC and I will share them with you, mentioning you, thanks again, I will update you later

I'm impressed my friend, this model is absurd

Miniclaus : https://www.youtube.com/watch?v=u_Hn8HTjYiA

Pancho V1va Replicant : https://youtu.be/uZp0hRcacBw

Very Nice, yes the 1.5b models are by far the best Models in performance for size, I spent quite a long time trying to get the 3b models to show the same efficiency but nothing comes close until you get to their new 7B parameter models, we are still testing and I may have mentioned it earlier but it's https://huggingface.co/IntelligentEstate/Israfel_Qwen2.6-iQ4_K_M-GGUF

I will subscribe and keep an eye on the channel. Thanks.. Make sure to follow FBLGIT as all of his trained models are usually the State of the Art when released.

I completely agree—the 1.5B models are an insane sweet spot between performance and efficiency, and your repacks bring out the best in them! I’ve also noticed that 3B models don’t quite match the speed-to-intelligence ratio, so I get why you focused on them for so long.

I’ll definitely test Israfel_Qwen2.6, I was already planning to give it a full run! Looking forward to seeing how it performs. And thanks for subscribing, really appreciate the support!

I’ll also check out FBLGIT—if his models are top-tier, I’m all in. 🔥

Yes. definitely give him credit as quantizing the models and repacking is nothing compared to the work and effort he puts into them and both of those models were made with his unique training and finetuning method.. I'm just trying to keep all the good stuff in one place and preserve the work in a smaller package. His model is like 3GBs and shrinking it down to half that and preserving it's functionality does take time but QAT training and preservation isn't possible without great models like his. Cheers

I just finished testing Israfel_Qwen2.6-iQ4_K_M-GGUF, and I have to say—I'm absolutely blown away. 🚀 This is, without a doubt, the best local model I have at my disposal right now.

It doesn’t just perform well for its size—it actually competes head-to-head with the latest trending 32B models, which is something I never expected from a model of this scale. The balance between intelligence, speed, and efficiency is simply unmatched.

I really want to extend my deepest compliments to you and your team. Your work in optimizing these models is genuinely next-level, and I’m truly impressed by the results. I hope to stay in touch and follow your progress—I can’t wait to see what’s next!

Here’s the video of the test: https://youtu.be/8jtO7hgAPbs

Thanks again for all your incredible efforts! 🔥

I just finished testing Thoth Warding-Llama 3B, and it delivered some really solid results. Compared to the other models I’ve tested, it holds up impressively well for a 3B parameter model. It handled logic, math, and problem-solving efficiently, and its reasoning was surprisingly strong for its size. While it doesn’t quite reach the same level as Israfel_Qwen2.6, it’s definitely one of the best smaller models I’ve run locally.

Here’s the test video:
https://youtu.be/6W2zvwOQox0

Appreciate the work you put into optimizing these models—looking forward to seeing what’s next.

Absolutely it was a pain to create the Importance matrix for it(I haven't released it yet) but that Is absolutely on of my favorites and it seems to work great on intel's chips too.

I tried running The Hooch-phi-4-R1-Q4_K_M-GGUF, but unfortunately, it doesn’t start. I’m sure it’s something on Ollama’s side, so I’ll troubleshoot it later.

On the other hand, FBLGit MiniClaus 1.5B is absolutely phenomenal—the real revelation of the day! I was really impressed with its speed, reasoning, and overall performance, so I decided to publish another test video:

📺 https://youtu.be/mJymgHu2Ru4

Thanks again for all the work you put into optimizing these models! By the way, what’s the best way to stay in touch with you? Do you have any social accounts I can follow?

Ok, we have another PHI-4 Model from Jpacifico https://huggingface.co/jpacifico, It might work but I don't have much experience with PHI models so if it doesn't work let me know but he is excellent at training and fine-tuning often getting the #1 spot upon release. his French corpus and other training methods gives the models unique qualities. https://huggingface.co/jpacifico

Hi how are you? I'm still using your PRYMMAL model and I'm very happy with it, I was wondering if it would be possible to ask you for a Q6K version of this model perhaps with more up to date information. Thanks!

What information are you looking for?
Reach out by email to [email protected] and we can cook up a new model from the original based on our new importance matrix no problem. If you want a pure Q6 quant of that model try it's page @ https://huggingface.co/brgx53/3Blarenegv3-ECE-PRYMMAL-Martial and you can follow it's model tree. (Picture below)
https://huggingface.co/mradermacher/3Blarenegv3-ECE-PRYMMAL-Martial-i1-GGUF on the model tree you will see it's Quantized versions. I believe you can find more Imatrix Quants at https://huggingface.co/mradermacher/3Blarenegv3-ECE-PRYMMAL-Martial-i1-GGUF I haven't had a ton of luck and I'm not sure of the Importance matrix he uses either, but he also does pure Quantization of that model available at https://huggingface.co/mradermacher/3Blarenegv3-ECE-PRYMMAL-Martial-GGUF
Reach out and we will help you out, appreciate the support and feedback.
Screenshot 2025-02-15 at 13-26-59 brgx53_3Blarenegv3-ECE-PRYMMAL-Martial · Hugging Face.png

Also you might like The VEGA series ,or for your system size I would recommend Phi-4 Q4 quants.
https://huggingface.co/IntelligentEstate/VEGA-A-Rombos-QwC2.5-7B-iQ4_K_M-GGUF or

Hi how are you? I'm still using your PRYMMAL model and I'm very happy with it, I was wondering if it would be possible to ask you for a Q6K version of this model perhaps with more up to date information. Thanks!

If you are still looking for a good Q5 I am adding one to intelligent Estate Page under Vega class, I will upload the Q5 versions it is a Qwen Base with higher reasoning but it is still in testing. https://huggingface.co/IntelligentEstate/Vega_lm-7B-Q5K_S-GGUF

Hi,

I've tested several versions of ECE-PRYMMAL with different quantizations, spending a lot of time optimizing their potential and running multiple tests. However, none of them have come close to the performance of the one featured in this thread.

To showcase the capabilities of this model, I made this video comparing it to Gemini:
📹 Video Link

At the moment, I haven't found anything better in the 7B range. I'll test Vega_lm and let you know my thoughts.

Thanks for your support, and my sincere compliments and respect for your work!

fuzzy-mittenz changed discussion status to closed

Well I'll check the specifics of the model. it is a combination Of FBLGIT's Cybertron and a Tsunami model so, I might have to recreate it from the base.

Try https://huggingface.co/mradermacher/Rombo-LLM-V2.7-gemma-2-9b-GGUF I know the Rombos org https://huggingface.co/Rombo-Org They have newer models out that function very similar but for the time that model is pretty far into the frontier and basically until someone in America beats it we aren't going to get anything better. The fine-tuning from FBL squeezed out the extra 15% where he should have only gotten maybe 5. Then shinking the model down to 1/3rd of it's size reduced it by many factors and I was nearly able to claw it back to it's +15% state but unless you can open up your context and fine tune your settings you currently have the best model out. Absolutely the best for its size. I absolutly appreciate the feedback and I'm actually trying to get you a Q5 today but I'll keep you updated

fuzzy-mittenz changed discussion status to open

OK.. Try using the System template in the VEGA model since they are both Qwen bases and we can see if that helps u out.

{%- if tools %} {{- '<|im_start|>system\n' }} {%- if messages[0]['role'] == 'system' %} {{- messages[0]['content'] }} {%- else %} {{- 'You are a helpful assistant.' }} {%- endif %} {{- "\n\n# Tools\n\nYou may call one or more functions to assist with the user query.\n\nYou are provided with function signatures within <tools></tools> XML tags:\n<tools>" }} {%- for tool in tools %} {{- "\n" }} {{- tool | tojson }} {%- endfor %} {{- "\n</tools>\n\nFor each function call, return a json object with function name and arguments within <tool_call></tool_call> XML tags:\n<tool_call>\n{\"name\": <function-name>, \"arguments\": <args-json-object>}\n</tool_call><|im_end|>\n" }} {%- else %} {%- if messages[0]['role'] == 'system' %} {{- '<|im_start|>system\n' + messages[0]['content'] + '<|im_end|>\n' }} {%- else %} {{- '<|im_start|>system\nYou are a helpful assistant.<|im_end|>\n' }} {%- endif %} {%- endif %} {%- for message in messages %} {%- if (message.role == "user") or (message.role == "system" and not loop.first) or (message.role == "assistant" and not message.tool_calls) %} {{- '<|im_start|>' + message.role + '\n' + message.content + '<|im_end|>' + '\n' }} {%- elif message.role == "assistant" %} {{- '<|im_start|>' + message.role }} {%- if message.content %} {{- '\n' + message.content }} {%- endif %} {%- for tool_call in message.tool_calls %} {%- if tool_call.function is defined %} {%- set tool_call = tool_call.function %} {%- endif %} {{- '\n<tool_call>\n{"name": "' }} {{- tool_call.name }} {{- '", "arguments": ' }} {{- tool_call.arguments | tojson }} {{- '}\n</tool_call>' }} {%- endfor %} {{- '<|im_end|>\n' }} {%- elif message.role == "tool" %} {%- if (loop.index0 == 0) or (messages[loop.index0 - 1].role != "tool") %} {{- '<|im_start|>user' }} {%- endif %} {{- '\n<tool_response>\n' }} {{- message.content }} {{- '\n</tool_response>' }} {%- if loop.last or (messages[loop.index0 + 1].role != "tool") %} {{- '<|im_end|>\n' }} {%- endif %} {%- endif %} {%- endfor %} {%- if add_generation_prompt %} {{- '<|im_start|>assistant\n' }} {%- endif %} 

Here is a Q6 from base fresh out of the oven.. https://huggingface.co/fuzzy-mittenz/3Blarenegv3-ECE-PRYMMAL-Martial-Q6_K-GGUF

I'll try it immediately, thanks I'll update you as soon as I have some useful results, thank you very much!

He solved goat and wolf problem in 6 steps instead of 7 which is no small feat :-) I am really excited about this model, I will continue to test it. Thanks


Pure logic and problem solving
Problem:
A man has to cross a river with a boat that can only carry 100 kg. He has a wolf (50 kg), a goat (40 kg) and a sack of grain (20 kg). How can he cross the river without the wolf eating the goat or the goat eating the grain?

Solution:

The man takes the goat to the opposite side of the river.
He goes back and takes the wolf.
He leaves the wolf on the opposite side, but takes the goat back.
He takes the sack of grain to the opposite side.
He goes back with the goat.
He takes the goat to the opposite side.

Nice, Those are the hardest problems for LLMs. Perfect basis for testing it's internal reasoning. We use similar questions because, it's like human speech, with a high perplexity and it just makes the model so much better at nearly everything people actually need it to do.

Thanks for the feedback.

Sign up or log in to comment