Tensor-wise (TWQ) and Layer-wise quantization (LWQ) now available in llama.cpp!
As of version b5125 users can now do TWQ, whereby you quantize a whole tensor at a specific level, or perform LWQ by choosing specific layers per tensor/s
The new --tensor-type option enables llama-quantize to apply user-defined quant levels to any combination of allowed tensors (i.e. tensors with 2 or more dimensions) and layer number, with support for regex patterns.
For example, to TWQ the Attention Value tensor you would use --tensor-type attn_v=q6_k and to perform LWQ you'll use something like --tensor-type "\.([0-9]|1[01257]|31)\.attn_v=q4_k"
๐ง ThinkFlow: The Revolutionary Platform That Gives LLMs the Power to Think ๐
Hello AI community! We're excited to introduce you to ThinkFlow, an innovative service that transforms how language models solve problems. ๐ VIDraft/ThinkFlow-llama
โจ What is ThinkFlow? ThinkFlow is a groundbreaking platform that automatically applies step-by-step reasoning capabilities to existing LLM models without any modifications. It makes complex problem-solving transparent, allowing you to witness the model's thought process in real-time.
๐ Key Features
Reasoning Without Model Modifications: Add step-by-step reasoning while utilizing existing LLMs as they are โ๏ธ Visualized Thinking Process: See exactly how the model analyzes and solves problems ๐๏ธ Before & After Comparison: Compare standard responses with reasoning-enhanced outputs in real-time ๐ Improved Accuracy: Deliver more accurate solutions for complex math and logic problems ๐ Educational Value: Teach students systematic approaches to problem-solving ๐จโ๐ซ User-Friendly Interface: Intuitive and easy-to-use UI for seamless experience ๐ฅ๏ธ
๐ก What Problems Can It Solve? ThinkFlow is particularly effective for various domains including:
๐จโ๐ป Technical Details ThinkFlow is built on the meta-llama/Llama-3.1-8B-Instruct model and uses carefully designed prompt chains to guide the model through step-by-step thinking. Each reasoning step builds upon the results of previous steps, culminating in a comprehensive final answer.
๐ฌ Join Our Community! If you have questions or suggestions about ThinkFlow, join our Discord community: https://discord.gg/openfreeai Let's build better AI reasoning experiences together! ๐ช
Hello, I've just written an article explaining the project I've made with my team at the Mistral AI Robotic Hackathon one week ago : https://huggingface.co/blog/Beegbrain/guess-who-so100-mistral ; Feel free to take a look, we are open-sourcing the code and begin to launch a community project around the idea, reach out to participate
1 reply
ยท
reacted to zhiminy's
post with ๐about 5 hours ago
# ๐ SE Arena: Evaluating Foundation Models for Software Engineering
**SE Arena** is the first open-source platform for evaluating foundation models in real-world software engineering workflows.
## What makes it unique?
- **RepoChat**: Automatically injects repository context (issues, commits, PRs) into conversations for more realistic evaluations - **Multi-round interactions**: Tests models through iterative workflows, not just single prompts - **Novel metrics**: Includes a "consistency score" that measures model determinism through self-play matches
Traditional evaluation frameworks don't capture how developers actually use models in their daily work. SE Arena creates a testing environment that mirrors real engineering workflows, helping you choose the right model for your specific software development needs.
From debugging to requirement refinement, see which models truly excel at software engineering tasks!
Official repo : https://github.com/Tencent/InstantCharacter I have significantly improved the official Repo app Put FLUX LoRAs into loras folder, it will download 3 LoRAs by default It will download necessary models into models folder automatically Lower Character Scale makes it more stylized like 0.6, 0.8 etc Also official repo Gradio was completely broken, fixed, improved, added new features like automatically save every generated image, number of generations and more Currently you need min 48GB GPUs, I am trying to make it work with lower VRAM via quantization
2 replies
ยท
reacted to JLouisBiz's
post with ๐ฅabout 5 hours ago
And you get universal explaining tool that works anywhere on your X Org Desktop (on operating systems which are usually Fully Free Software like Debian GNU/Linux)