Join the conversation

Join the community of Machine Learners and AI enthusiasts.

Sign Up

All HF Hub posts

HannesVonEssenย 
posted an update 3 days ago
view post
Post
139
๐Ÿ“ฃ I made a visualizer for Hugging Face models: https://hfviewer.com

โœจ Simply paste a Hugging Face URL to get an interactive visualization of the architecture!

๐Ÿ”— The recent Qwen3.6-27B model as an example: https://hfviewer.com/Qwen/Qwen3.6-27B

Feel free to try it out and give me feedback on how it can be improved! โค๏ธ
  • 1 reply
ยท
danielhanchenย 
posted an update 1 day ago
view post
Post
6538
We made a guide on how to run open LLMs in Claude Code, Codex and OpenClaw.

Use Gemma 4 and Qwen3.6 GGUFs for local agentic coding on 24GB RAM

Run with self-healing tool calls, code execution, web search via the Unsloth API endpoint and llama.cpp

Guide: https://unsloth.ai/docs/basics/api
DedeProGamesย 
posted an update 2 days ago
view post
Post
8057
GRaPE 2 Pro is now available.

SL-AI/GRaPE-2-Pro

This is the flagship model of the GRaPE 2 family and the largest model I have trained to date, sitting at 27B parameters. It is built on Qwen3.5-27B and trained on a closed-source proprietary dataset, with roughly half of post-training focused on code and the rest split between STEAM subjects and structured logical reasoning. It punches seriously above its weight class.

GRaPE 2 Pro supports multimodal input (image + text) and features 6 thinking modes via the tag. This gives you real control over how hard the model thinks, from skipping the reasoning phase entirely with minimal, all the way up to xtra-Hi for deep, extended thought on hard problems. For most agentic use, auto or low is the move to keep things snappy.

It also runs on consumer hardware. You can get it going with as low as 12GB of VRAM on a quantized build.

If you want to try it out and give feedback, that would be really appreciated. Email us at [email protected]
  • 1 reply
ยท
Crowneliusย 
posted an update about 17 hours ago
view post
Post
1787
Day 4-6 [05/05/2026]
Howdy,

Is anybody else willing to put a second mortgage on their house, just to spend 40k USD in compute credits? Just me? k...

I got dreams, man. The datasets I could build with 40k would be insane.
Somebody called me a genius the other day, they'd be shocked to find out, that I would put my house on the line for 30 days of runpod usage.

What would you do with it?
I would turn arxiv into a dataset. Turn each arxiv paper into a QnA.
Or... maybe if I got 40k USD in credit's Id end up like those 16 lost scientists.

Food for thought.
Anyways, I think I'm going to make a post once a week.
In the meantime you can find me building small llm's in discord here:
https://discord.gg/4DdwS9D8x9
  • 3 replies
ยท
yuriyvnvย 
posted an update about 6 hours ago
view post
Post
192
๐Ÿ“„ The WAVe paper is officially out in the Information Sciences Journal.

You saw the PT and NL model releases earlier this year. This is the peer-reviewed paper behind them, with the full method, ablations, and downstream ASR evaluation.

Quick recap: WAVe is a 1B multimodal embedding model that filters synthetic speech at the word level, not the sentence level. On Portuguese ASR it cuts training steps by 34%, improves cross-domain generalization by 50%, and matches WER with 30% less synthetic data.

๐Ÿ“ฆ Resources
- Paper: https://www.sciencedirect.com/science/article/pii/S0020025526005220
- PT model: yuriyvnv/WAVe-1B-Multimodal-PT
- NL model: yuriyvnv/WAVe-1B-Multimodal-NL
- Collection: https://huggingface.co/collections/yuriyvnv/multi-modal-embeddings-for-synthetic-transcript-filtering
- Code: https://github.com/yuriyvnv/WAVe

If you train ASR on synthetic or back-translated data, would like to see WAVe benchmarked on other languages.

@reach-vb @ylacombe @hf-audio @BramVanroy

#speech #asr #multimodal #syntheticdata #lowresource
salma-remyxย 
posted an update about 6 hours ago
view post
Post
142
VQASynth is the open source implementation of the SpatialVLM: Endowing Vision-Language Models with Spatial Reasoning Capabilities (2401.12168) paper, putting together the data synthesis pipeline behind remyxai/SpaceQwen2.5-VL-3B-Instruct, remyxai/SpaceThinker-Qwen2.5VL-3B, and several other spatial reasoning models we've shared here on HF.

From early development through production, different categories of evidence become available to guide what to try next. The strongest decisions combine evidence across categories rather than relying on any one.

Stage 1: Development history
Commit history holds the moments where things changed. For VQASynth, that's how scenes get parsed, how captions get generated, how spatial relations get encoded. Even before a model is in production, those milestones are a strong signal for what methods are semantically relevant to where the system is now.

Stage 2: Observational outcomes
Once a model is serving, the same commit history delineates changes against real-world results. That opens up quasi-experiments. You get causal evidence about which changes drove which outcomes, and inference on questions you haven't directly tested.

Stage 3: Controlled experiments
When teams start running interventions, those outcomes tighten the estimates further. This is the regime most people associate with rigor, but it's expensive and gated by traffic.

Stage 4: Counterfactual perturbations
When A/B testing becomes the operational bottleneck, instrumenting decision points in the production system lets you probe what would have happened under alternative choices. Shadow mode first, live traffic once audits pass.

Experimentation maturity is a journey, and every stage offers something to learn from.
More on these ideas: https://docs.remyx.ai/concepts/maturity-progression
ArtelTalebย 
posted an update about 14 hours ago
view post
Post
186

โœˆ๏ธ World Flight Arcade - Can you land in 60 seconds?

I just dropped a new browser game built entirely with Three.js: World Flight Arcade

The concept is brutally simple:
- ๐Ÿ• 60 seconds of flight above a neon wireframe city
- โœˆ๏ธ One single attempt to land on the runway
- ๐Ÿ’€ No second chances. No respawn. Just you, the controls, and the clock.

The camera system is fully dynamic - it stays locked behind the plane within a ยฑ45ยฐ pitch/yaw envelope, giving you that cinematic flight feel while keeping full spatial awareness.

Can you nail the landing on your first try?

๐Ÿ‘‰ Play here: ArtelTaleb/world-flight-arcade

Built by Artel3D - handcrafted in Three.js, zero dependencies, runs directly in your browser.

Drop your score in the comments ๐Ÿ‘‡

#gamedev #threejs #browserGame #webgl #artel3d #indiegame

unmodeled-tylerย 
posted an update about 2 hours ago
view post
Post
6
Hey Hugging Face!

Repo: https://github.com/unmodeled-tyler/vessel-browser

I wanted to share a cool feature from my open source AI native web browser, Vessel: Persistent highlights!

You can highlight anything on the page and the context is provided to the agent. It's kind of a fun way to learn about new stuff, synthesize info, or just deepen your comprehension/understanding.

Since highlights are persistent, you can close the page, come back later - and your highlights will be exactly where you left them. I've found this particularly useful when reviewing technical blogs, model cards, etc.

Check it out!
  • 1 reply
ยท
BlueNipplesย 
posted an update about 10 hours ago
view post
Post
33
Good news, llama.cpp seems to be close to supporting MTP on qwen models. Bad news, every single gguf will have to be redone when it is.
  • 1 reply
ยท
ajibawa-2023ย 
posted an update about 13 hours ago
view post
Post
78
Stitched-Reasoning-Trajectories-7M

Dataset: ajibawa-2023/Stitched-Reasoning-Trajectories-7M
Stitched-Reasoning-Trajectories-7M is a massive-scale, synthetic multi-hop reasoning dataset. It was built by algorithmically "stitching" together discrete reasoning traces from the original glaiveai/reasoning-v1-20m dataset into continuous, coherent, and logically structured multi-agent trajectories.

By extracting internal sub-questions from <think> blocks and mapping high-information keyword overlaps, this dataset transforms single-turn Q&A pairs into deep, multi-step research plans. To ensure high quality and eliminate "topic drift," every trajectory has been verified using a dense semantic embedding model (BAAI/bge-large-en-v1.5).

The resulting dataset consists of 709 .jsonl files containing over 7.2 million entirely deduplicated, highly coherent reasoning chains.