John6666 (John Smith)

Tensor-wise (TWQ) and Layer-wise quantization (LWQ) now available in llama.cpp!

As of version b5125 users can now do TWQ, whereby you quantize a whole tensor at a specific level, or perform LWQ by choosing specific layers per tensor/s

The new --tensor-type option enables llama-quantize to apply user-defined quant levels to any combination of allowed tensors (i.e. tensors with 2 or more dimensions) and layer number, with support for regex patterns.

For example, to TWQ the Attention Value tensor you would use --tensor-type attn_v=q6_k and to perform LWQ you'll use something like --tensor-type "\.([0-9]|1[01257]|31)\.attn_v=q4_k"

In the next few days/weeks I'll update the models in my HF repo (and will add some others) but eaddario/DeepSeek-R1-Distill-Llama-8B-GGUF and eaddario/DeepSeek-R1-Distill-Qwen-7B-GGUF have been already LWQed.

For reference, compared to the naive Q4_K_M model, the LWQ Qwen-7B is almost 11% smaller (4.68GB vs 4.18GB) with only a 0.35% penalty on PPL!

I'll update the https://medium.com/@eaddario/squeezing-tensor-bits-the-quest-for-smaller-llms-86b23bd052ca post to explain the process in detail, but in the meantime the following links will provide some background:

- Changes to llama-quantize: https://github.com/ggml-org/llama.cpp/pull/12511
- TWQ & LWQ tests: https://github.com/ggml-org/llama.cpp/discussions/12741
- Modified llama-imatrix (not yet merged) used to generate imatrix statistics to guide the TWQ and LWQ process: https://github.com/ggml-org/llama.cpp/pull/12718

reacted to openfree's post with 🔥 about 5 hours ago

Post

1865

🧠 ThinkFlow: The Revolutionary Platform That Gives LLMs the Power to Think 🚀

Hello AI community! We're excited to introduce you to ThinkFlow, an innovative service that transforms how language models solve problems. 🎉
VIDraft/ThinkFlow-llama

✨ What is ThinkFlow?
ThinkFlow is a groundbreaking platform that automatically applies step-by-step reasoning capabilities to existing LLM models without any modifications. It makes complex problem-solving transparent, allowing you to witness the model's thought process in real-time.

🔍 Key Features

Reasoning Without Model Modifications: Add step-by-step reasoning while utilizing existing LLMs as they are ⚙️
Visualized Thinking Process: See exactly how the model analyzes and solves problems 👁️
Before & After Comparison: Compare standard responses with reasoning-enhanced outputs in real-time 📊
Improved Accuracy: Deliver more accurate solutions for complex math and logic problems 📈
Educational Value: Teach students systematic approaches to problem-solving 👨‍🏫
User-Friendly Interface: Intuitive and easy-to-use UI for seamless experience 🖥️

💡 What Problems Can It Solve?
ThinkFlow is particularly effective for various domains including:

Complex mathematical problems 🧮
Logic puzzles 🧩
Questions requiring multi-step reasoning 🤔
Scientific analysis challenges 🔬
Complex decision-making processes 📝

👨‍💻 Technical Details
ThinkFlow is built on the meta-llama/Llama-3.1-8B-Instruct model and uses carefully designed prompt chains to guide the model through step-by-step thinking. Each reasoning step builds upon the results of previous steps, culminating in a comprehensive final answer.

💬 Join Our Community!
If you have questions or suggestions about ThinkFlow, join our Discord community: https://discord.gg/openfreeai
Let's build better AI reasoning experiences together! 💪

#AI #LLM #ReasoningAI #ThinkFlow #HuggingFace #OpenSource #AIEducation

7 replies

·

updated a collection about 5 hours ago

Spaces for LLM / VLM / NLP

Collection

1036 items • Updated about 5 hours ago • 10

liked a Space about 5 hours ago

29

ThinkFlow

🐨

thought that reasoning into LLMs without modification

reacted to Beegbrain's post with 🚀 about 5 hours ago

Post

424

Hello, I've just written an article explaining the project I've made with my team at the Mistral AI Robotic Hackathon one week ago : https://huggingface.co/blog/Beegbrain/guess-who-so100-mistral ; Feel free to take a look, we are open-sourcing the code and begin to launch a community project around the idea, reach out to participate

1 reply

·

reacted to zhiminy's post with 🚀 about 5 hours ago

Post

425

# 🚀 SE Arena: Evaluating Foundation Models for Software Engineering

**SE Arena** is the first open-source platform for evaluating foundation models in real-world software engineering workflows.

## What makes it unique?

- **RepoChat**: Automatically injects repository context (issues, commits, PRs) into conversations for more realistic evaluations
- **Multi-round interactions**: Tests models through iterative workflows, not just single prompts
- **Novel metrics**: Includes a "consistency score" that measures model determinism through self-play matches

Try it now: SE-Arena/Software-Engineering-Arena

## Why it matters

Traditional evaluation frameworks don't capture how developers actually use models in their daily work. SE Arena creates a testing environment that mirrors real engineering workflows, helping you choose the right model for your specific software development needs.

From debugging to requirement refinement, see which models truly excel at software engineering tasks!

updated a collection about 5 hours ago

Spaces for LLM / VLM / NLP

Collection

1036 items • Updated about 5 hours ago • 10

liked a Space about 5 hours ago

2

SE-Arena

🛠

The chatbot arena for software engineering

reacted to MonsterMMORPG's post with 👀 about 5 hours ago

Post

317

Tencent InstantCharacter 1-Click Installers for Windows, RunPod and Massed Compute, Supports RTX 5000 series as well

Latest installer zip file : https://www.patreon.com/posts/126995127

Use above link to get installer zip file

Official repo : https://github.com/Tencent/InstantCharacter
I have significantly improved the official Repo app
Put FLUX LoRAs into loras folder, it will download 3 LoRAs by default
It will download necessary models into models folder automatically
Lower Character Scale makes it more stylized like 0.6, 0.8 etc
Also official repo Gradio was completely broken, fixed, improved, added new features like automatically save every generated image, number of generations and more
Currently you need min 48GB GPUs, I am trying to make it work with lower VRAM via quantization

2 replies

·

reacted to JLouisBiz's post with 🔥 about 5 hours ago

Post

420

Back to LLM integration.

ClickDefine.sh -- quickly define or explain anything within your whole desktop environment

You only need to run the model locally, maybe with the **llama.cpp** or **ollama**

- https://github.com/ggml-org/llama.cpp
- https://ollama.com/download

And you get universal explaining tool that works anywhere on your X Org Desktop (on operating systems which are usually Fully Free Software like Debian GNU/Linux)

ClickDefine - Interactive Text Processor Script for Iterative LLM Query Handling:
https://hyperscope.link/9/6/0/9/8/ClickDefine-Interactive-Text-Processor-Script-for-Iterative-LLM-Query-Handling-96098.html

Watch the demonstration here: https://www.youtube.com/watch?v=mQxCYAiReu0&t=2s

John Smith PRO

AI & ML interests

Recent Activity

Organizations

John6666's activity

John6666/yesmix-xl-illustrious-v25-sdxl

John6666/yesmix-xl-illustrious-v25-sdxl

John6666/sparkleberry-illustrious-xl-v10-sdxl

John6666/real-photo-sdxi-5-sdxl

John6666/sparkleberry-illustrious-xl-v10-sdxl

John6666/real-photo-sdxi-5-sdxl

AbstractPhil/Liminal-Full

Spaces for Audio / Voices

SimpleRVC

Spaces for LLM / VLM / NLP

ThinkFlow

Spaces for LLM / VLM / NLP

SE-Arena