Quazim0t0 (Quazimoto)

reacted to onekq's post with 👍 1 day ago

Post

707

A bigger and harder pain point for reasoning model is to switch modes.

We now have powerful models capable of either system I thinking or system II thinking, but not both, much less switching between the two. But humans can do this quite easily.

ChatGPT and others push the burden to users to switch between models. I guess this is the best we have now.

2 replies

·

reacted to AdinaY's post with 🔥 1 day ago

Post

813

R1-Omni🔥RLVR-Powered Multimodal LLM released by Alibaba

Model: StarJiaxing/R1-Omni-0.5B
Paper: R1-Omni: Explainable Omni-Multimodal Emotion Recognition with Reinforcing Learning (2503.05379)

✨0.5B with Apache2.0
✨ Improve emotion recognition with visual and audio cues

1 reply

·

reacted to thomwolf's post with 🚀 1 day ago

Post

1140

We've kept pushing our Open-R1 project, an open initiative to replicate and extend the techniques behind DeepSeek-R1.

And even we were mind-blown by the results we got with this latest model we're releasing: ⚡️OlympicCoder ( open-r1/OlympicCoder-7B and open-r1/OlympicCoder-32B)

It's beating Claude 3.7 on (competitive) programming –a domain Anthropic has been historically really strong at– and it's getting close to o1-mini/R1 on olympiad level coding with just 7B parameters!

And the best part is that we're open-sourcing all about its training dataset, the new IOI benchmark, and more in our Open-R1 progress report #3: https://huggingface.co/blog/open-r1/update-3

Datasets are are releasing:
- open-r1/codeforces
- open-r1/codeforces-cots
- open-r1/ioi
- open-r1/ioi-test-cases
- open-r1/ioi-sample-solutions
- open-r1/ioi-cots
- open-r1/ioi-2024-model-solutions

reacted to Lunzima's post with 🚀 1 day ago

Post

696

I'm currently experimenting with the SFT dataset Lunzima/alpaca_like_dataset to further boost the performance of NQLSG-Qwen2.5-14B-MegaFusion-v9.x. This includes data sourced from DeepSeek-R1 or other cleaned results (excluding CoTs). Additionally, datasets that could potentially enhance the model's performance in math and programming/code, as well as those dedicated to specific uses like Swahili, are part of the mix.
@sometimesanotion @sthenno @wanlige

1 reply

·

reacted to awacke1's post with 🚀 2 days ago

Post

1959

I introduce MIT license

ML Model Specialize Fine Tuner app "SFT Tiny Titans" 🚀

Demo video with source.

Download, train, SFT, and test your models, easy as 1-2-3!
URL: awacke1/TorchTransformers-NLP-CV-SFT

2 replies

·

reacted to BrigitteTousi's post with 🚀 2 days ago

Post

2734

LeRobot goes to driving school! 🚗🚗🚗

Hugging Face just announced a new collab with Yaak to bring the largest open-source self-driving dataset to LeRobot!

Major kudos to HF's @cadene , as well as @sandhawalia , @Shnissen and the Yaak team!

Check out the blog post here: https://huggingface.co/blog/lerobot-goes-to-driving-school

1 reply

·

reacted to sandhawalia's post with 🔥 2 days ago

Post

1729

LeRobot goes to driving school. World's largest open-source self driving dataset. Ready for end-to-end learning with LeRobot.

3 years, 30 German cities, 60 driving instructors and students. https://huggingface.co/blog/lerobot-goes-to-driving-school

Coming this summer — LeRobot driver.

reacted to fdaudens's post with 🤗 3 days ago

Post

5642

Honored to be named among their 12 pioneers and power players in the news industry in the 2025 Tech Trends Report from Future Today Strategy Group.

Incredible group to be part of - each person is doing groundbreaking work at the intersection of AI and journalism. Worth following them all: they're consistently sharing practical insights on building the future of news.

Take the time to read this report, it's packed with insights as always. The news & information section's #1 insight hits hard: "The most substantive economic impact of AI to date has been licensing payouts for a handful of big publishers. The competition will start shifting in the year ahead to separate AI 'haves' that have positioned themselves to grow from the 'have-nots.'"

This AI-driven divide is something I've been really concerned about. Now is the time to build more than ever!

👉 Full report here: https://ftsg.com/wp-content/uploads/2025/03/FTSG_2025_TR_FINAL_LINKED.pdf

2 replies

·

replied to their post 4 days ago

@Delta-Vector
I uploaded the 16Bit on this update. haven't tested it.

posted an update 4 days ago

Post

520

Update to the Imagine side-project.
Just uploaded the 16Bit & Q4

Samples: (Used a base Microsoft Phi4 model)
*You may experience bugs with either the model or the Open WebUI function*
Open WebUI function: https://openwebui.com/f/quaz93/imagine_phi
Quazim0t0/Imagine-v0.5-16bit - Haven't tested
Quazim0t0/ImagineTest-v0.5-GGUF - Tested (Pictures)

Dataset: Quazim0t0/Amanita-Imagine
Small Dataset of 500+ entries, still working on it here and there when I can.
Pictures use the Open Web UI function I provided.

1 reply

·

replied to their post 11 days ago

I'll try to get that out for you when I get a chance.

reacted to AdinaY's post with 🔥🚀 11 days ago

Post

474

LLaDA 🔥a 8B diffusion model by GSAI Lab Renmin University
✨Fully trained from scratch, LLaDA delivers performance on par with LLaMA3 8B
Model: GSAI-ML/LLaDA-8B-Instruct
Demo: multimodalart/LLaDA
Paper: Large Language Diffusion Models (2502.09992)

reacted to ZennyKenny's post with 👍 12 days ago

Post

1873

I've spent most of time working with AI on user-facing apps like Chatbots and TextGen, but today I decided to work on something that I think has a lot of applications for Data Science teams: ZennyKenny/comment_classification

This Space supports uploading a user CSV and categorizing the fields based on user-defined categories. The applications of AI in production are truly endless. 🚀

posted an update 12 days ago

Post

2234

Debugging Tags:
Imagine, Associated Thoughts, Dialectical Analysis, Backwards Induction, Metacognition, and Normal Thought Processes such as <think> or <begin_of_thought>

Edit: Uploaded new images w/ a Open WebUI function to organize the tags.
Open WebUI Function: https://openwebui.com/f/quaz93/imagine_phi

This Phi-4 model is part of a test project that I called Micro-Dose. My goal was to use a small dataset to activate reasoning and other cognitive processes without relying on a large dataset.

I found that this was possible with a tiny dataset of just 90 rows, specifically designed as math problems. In the initial iterations, the dataset only activated reasoning when a math-related question was asked. I then made a few changes to the dataset’s structure, including the order of information and the naming of tags. You can see the sample results in the pictures. Not really anything special, just thought I'd share.

Tweaked the dataset a bit:
Quazim0t0/Imagine-Phi-v0.2-GGUF
Quazim0t0/MicroDoseV0.2

First image shows the new tags, second shows the regular thought process and the third is the model in combination with web searches

2 replies

·

reacted to lingvanex-mt's post with 🔥👍 12 days ago

Post

3495

Dear HF Community!

Our company open-sourced machine translation models for 12 rare languages under MIT license.

You can use them freely with OpenNMT translation framework. Each model is about 110 mb and has an excellent performance, ( about 40000 characters / s on Nvidia RTX 3090 )

Download models there

https://huggingface.co/lingvanex

You can test translation quality there:

https://lingvanex.com/translate/

reacted to caelancooper's post with 👍 19 days ago

Post

954

Hey Huggingface Community,

I'm just starting my journey. I'm here to learn and contribute as much as I can to the AI community. What happened with one of my models was I left the security permissions open for people to commit changes and contribute to the model in good faith and the opposite happened.

I'm open to all feedback you may have on my future projects. Let's keep it collegial and try to make something amazing. I always stride to make situations a win for all parties involved and would love to collaborate with anybody who's interested in innovation, optimization and new use cases for AI.

Thanks Everyone,
Caelan

posted an update 23 days ago

Post

2373

My first attempt at using SmolAgents:
Quazim0t0/CSVAgent

The video attached was an example for this space.

Based on ZennyKenny's SqlAgent:
ZennyKenny/sqlAgent

You can upload a CSV file and it will automatically populate the table, then you can ask questions about the data.

Grab a sample CSV file here: https://github.com/datablist/sample-csv-files

The questions that can be asked may be limited.

_______________________
Second: Quazim0t0/TXTAgent
Created an Agent that converts a .txt file into a CSV file, then you can ask about the data and also download the CSV file that was generated.

_______________________
Third: Quazim0t0/ReportAgent
Upload Multiple TXT/DOC files to then generate a report from those files.

_______________________
Lastly: Quazim0t0/qResearch
A Research tool that uses DuckDuckGo for Web Searches, Wikipedia and tries to refine the answers in MLA Format.

reacted to Jaward's post with 🔥 24 days ago

Post

3867

Finally here it is: a faster, custom, scalable GRPO trainer for smaller models with < 500M params, can train on 8gb ram cpu, also supports gpu for sanity sake (includes support for vllm + flash attention). Using smolLM2-135M/360M-instructs as ref & base models. Experience your own “aha” moment 🐳 on 8gb ram.
Code: https://github.com/Jaykef/ai-algorithms/blob/main/smollm2_360M_135M_grpo_gsm8k.ipynb

2 replies

·

Quazimoto PRO

AI & ML interests

Recent Activity

Organizations

Quazim0t0's activity